Skip to content

Fold - Core Walkthrough

Download Open In Colab

Walkthrough Cover.png

Welcome 👋

In this notebook we'll demonstrate fold's powerful interface for creating, training, and cross-validating (or backtesting, if you prefer) simple and composite models/pipelines.

We will use the dataset from an Energy residual load forcasting challenge hosted on Kaggle.

By the end you will know how to: - Create a simple and ensemble model (composite model) - Train multiple models / pipelines over time - Analyze the model's simulated past performance

Let's start by installing: - fold - fold-wrappers: optional, this will be required later for third party models. Wraps eg. XGBoost or StatsForecast models to be used with fold. - krisi, optional. Dream Faster's Time-Series evaluation library to quickly get results.

Installing libraries

pip install --quiet fold-core fold-wrappers fold-models krisi matplotlib seaborn xgboost plotly prophet statsforecast statsmodels ray kaleido

Data Loading and Exploration

Let's load in the data and do minimal exploration of the structure of the data.

fold has a useful utility function that loads example data from our datasets GitHub repo.

  • We are forecasting residual_load‡.
  • We will shorten the dataset to 4000 rows so we have a speedier demonstration.

The difference between the load in the network and the P that the industrial complex is producing.

from fold.utils.dataset import get_preprocessed_dataset
from import plot_acf
from krisi import score, compare
from import plot_y_predictions
import as pio
pio.renderers.default = "png"

X, y = get_preprocessed_dataset(
no_of_observation_per_day = 24
no_of_observation_per_week = no_of_observation_per_day * 7

y.plot(figsize = (20,5), grid=True);


The data format may be very familiar - it looks like the standard scikit-learn data.

X represents exogenous variables in fold, where a single row corresponds to a single target value. That means we currently only support univariate time-series (with exogenous variables), but soon we're extending that.

It's important that the data should be sorted and its integrity (no missing values, no duplicate indicies) should be checked before passing data to fold.

P Gb(i) Gd(i) H_sun T2m WS10m load residual_load
2018-01-01 00:00:00 0.0 0.0 0.0 0.0 8.44 5.54 120.0 120.0
2018-01-01 01:00:00 0.0 0.0 0.0 0.0 7.56 5.43 115.5 115.5
2018-01-01 02:00:00 0.0 0.0 0.0 0.0 7.04 5.33 120.5 120.5
2018-01-01 03:00:00 0.0 0.0 0.0 0.0 6.48 5.67 123.5 123.5
2018-01-01 04:00:00 0.0 0.0 0.0 0.0 5.95 5.79 136.5 136.5

(We'll ignore the exogenous variables until a bit later)

2018-01-01 00:00:00    115.5
2018-01-01 01:00:00    120.5
2018-01-01 02:00:00    123.5
2018-01-01 03:00:00    136.5
2018-01-01 04:00:00    138.0
Freq: H, Name: residual_load, dtype: float64

You can see that y (our target) contains the next value of X's "residual_load" column.

Time Series Cross Validation with a univariate forecaster

1. Model Building

fold has three core type of building blocks which you can build arbitrary sophisticated pipelines from: - Transformations (classes that change, augment the data. eg: AddHolidayFeatures adds a column feature of holidays/weekends to your exogenous variables) - Models (eg.: Sklearn, Baseline Models, third-party adapters from fold-wrappers, like Statsmodels) - Composites (eg.: Ensemble - takes the mean of the output of arbitrary number of 'parallel' models or pipelines)

Let's use Facebook's popular Prophet library, and create in instance.

If fold-wrappers is installed, fold can take this instance without any additional wrapper class.

from prophet import Prophet
prophet = Prophet()

2. Creating a Splitter

A splitter allows us to do Time Series Cross-Validation with various strategies.

fold supports three types of Splitters: Splitter

from fold.splitters import ExpandingWindowSplitter

splitter = ExpandingWindowSplitter(
    initial_train_window=no_of_observation_per_week * 6,

Here, initial_train_window defines the first window size, step is the size of the window between folds.

We're gonna be using the first 6 weeks as our initial window, and re-train (or update, in another training mode) it every week after. We'll have 18 models, each predicting the next week's target variable.

You can also use percentages to define both, for example, 0.1 would be equivalent to 10% of the availabel data.

3. Training a (univariate) Model

We could use ray to parallelize the training of multiple folds, halving the time it takes for every CPU core we have available (or deploying it to a cluster, if needed).

We pass in None as X, to indicate that we want to train a univariate model, without any exogenous variables.

from fold import train_evaluate, BackendType
import ray

scorecard, predictions, trained_pipeline = train_evaluate(prophet, None, y, splitter, backend=BackendType.ray, krisi_args={"model_name":"prophet"})

4. Evaluating the results

                     Mean Absolute Error  9.507563e+01
          Mean Absolute Percentage Error  7.955506e+13
Symmetric Mean Absolute Percentage Error  5.092002e-01
                      Mean Squared Error  1.477914e+04
                 Root Mean Squared Error  1.215695e+02
                               R-squared  4.268419e-01
                   Mean of the Residuals  6.361641e+00
     Standard Deviation of the Residuals  1.214232e+02
plot_y_predictions(y[predictions.index], predictions, mode="overlap", y_name="residual_load")


Finally, let's save the scorecard into a list, so we can compare the results later.

results = [(scorecard, predictions)]

Using an Ensemble (Composite) model

Here we will build an Ensemble model that leverages the output of multiple models.

Ensembling Models.png

1. Model Building with fold-wrappers

We are going to define three different pipelines, each leveraging a different model and different features.

We can leverage the most popular modelling libraries, like StatsForecast, Sktime, XGBoost, etc. (the list can be found here).

Let's train a MSTL model that's implemented in StatsForecast, that can capture multiple seasonalities, with the WrapStatsForecast class from fold-wrappers. This is not strictly necessary, though, as the automatic wrapping also works for StatsForecast instaces as well.

from statsforecast.models import MSTL
from fold.models.wrappers import WrapStatsForecast, WrapStatsModels

mstl = WrapStatsForecast.from_model(MSTL([24, 168]))

2. Ensembling with fold

Finally, let's ensemble the two pipelines.

from fold.composites import Ensemble

univariate_ensemble = Ensemble([prophet, mstl])

3. Training all pipelines seperately and within an ensemble

We'll use the same ExpandingWindowSplitter we have defined above, to make performance comparable.

from fold import train_evaluate

for name, pipeline in [
    ("mstl", mstl),
    scorecard, predictions, pipeline_trained = train_evaluate(pipeline, None, y, splitter, krisi_args={"model_name":name})
    results.append((scorecard, predictions))
compare([scorecard for scorecard, predictions in results], ['rmse', 'mse'])
rmse mse
prophet 121.569502 14779.143772
mstl 118.703683 14090.564243
univariate_ensemble 105.962271 11228.002822

We see that our Ensemble model has beaten all individual models' performance - which is very usual in the time series context.

Using a single-step ahead forecaster (a baseline)

So far we've used models that were costly to update (or re-train) every day, therefore we were limited to training once for every week, then predicting the next week's target.

What if we could use a lightweight, "online" model, that can be updated on every timestamp?

And.. what if we just repeat the last value?

That'd be the Naive model you can load from fold.models.

from fold import train_evaluate
from fold.models import Naive

scorecard, predictions, trained_pipeline = train_evaluate(Naive(), None, y, splitter, krisi_args={"model_name":"naive"})
results.append((scorecard, predictions))
  0%|          | 0/18 [00:00<?, ?it/s]

  0%|          | 0/18 [00:00<?, ?it/s]
                     Mean Absolute Error  3.911054e+01
          Mean Absolute Percentage Error  2.122351e+14
Symmetric Mean Absolute Percentage Error  2.038015e-01
                      Mean Squared Error  4.012944e+03
                 Root Mean Squared Error  6.334780e+01
                               R-squared  8.443718e-01
                   Mean of the Residuals -4.387701e-02
     Standard Deviation of the Residuals  6.335837e+01

We call this Adaptive Backtesting.

It looks like having access to last value really makes a difference: the baseline model beats all long-term forecasting models by a large margin.

It's extremely important to define our forecasting task well: 1. We need to think about what time horizon can and should forecast 2. And how frequently can we update our models.

Long-horizon (in this case, a week ahead) forecasts can be very unreliable, on the other hand, frequent, short-term forecasts are where Machine Learning shines (as we'll see in the next section).

Using exogenous variables with Tabular Models

So far we have been training univariate models, and ignored all the additional, exogenous variables that come with our data.

Let's try whether using this data boost our model's performance!

Building Models separately

We'll be using scikit-learn's HistGradientBoostingRegressor, their competing implementation of Gradient Boosted Trees. You don't need to wrap scikit-learn models or transformations when using it in fold, just pass it in directly to any pipeline.

from sklearn.ensemble import HistGradientBoostingRegressor

tree_model = HistGradientBoostingRegressor(max_depth=10)

Let's add both holiday and date-time features to our previous ensemble pipeline.

The data was gathered in the Region of Hessen, Germany -- so we pass in DE (we can pass in multiple regions). This transformation adds another column for holidays to our exogenous (X) features.

We're also adding the current hour, and day of week as integers to our exogenous features. This is one of the ways for our tabular model to capture seasonality.

from fold.transformations import AddHolidayFeatures, AddDateTimeFeatures

datetime = AddDateTimeFeatures(['hour', 'day_of_week', 'day_of_year'])
holidays = AddHolidayFeatures(['DE'])

Let's add a couple of lagged, exogenous values for our model. AddLagsX receives a tuple of column name and integer or list of lags, for each of which it will create a column in X.

We can easily create transformations of existing features on a rolling window basis with AddWindowFeatures as well, in this case, the last day's average value for all of our exogenous features.

We can "tie in" two separate pipelines with Concat, which concatenates all columns from all sources.

from fold.transformations import AddWindowFeatures, AddLagsX
from fold.composites import Concat

tree = [
        AddWindowFeatures([("all", 24, "mean")]),

Let's see how this performs!

We can also use fold's train, backtest to decouple these functionalities.

from fold import train, backtest

trained_pipeline = train(tree, X, y, splitter)
predictions = backtest(trained_pipeline, X, y, splitter)
scorecard = score(y[predictions.index], predictions.squeeze(), model_name="tabular_tree")

results.append((scorecard, predictions))
  0%|          | 0/18 [00:00<?, ?it/s]

  0%|          | 0/18 [00:00<?, ?it/s]
                     Mean Absolute Error  2.965292e+01
          Mean Absolute Percentage Error  2.123209e+14
Symmetric Mean Absolute Percentage Error  1.591514e-01
                      Mean Squared Error  2.637592e+03
                 Root Mean Squared Error  5.135749e+01
                               R-squared  8.977101e-01
                   Mean of the Residuals -1.747140e+00
     Standard Deviation of the Residuals  5.133635e+01

Creating an Ensemble of Tabular models

First let's creat two more models: * an Sklearn LinearRegressor * and an XGBoostRegressor instance

We are also going to use the HistGradientBoostingRegressor pipeline that we defined prior.

from sklearn.linear_model import LinearRegression

lregression = [
from xgboost import XGBRegressor
from fold.models.wrappers.gbd import WrapXGB

xgboost = [
tabular_ensemble = Ensemble([lregression, xgboost, tree])
scorecard, predictions, pipeline_trained = train_evaluate(tabular_ensemble, X, y, splitter, krisi_args={"model_name":"tabular_ensemble"})
results.append((scorecard, predictions))
  0%|          | 0/18 [00:00<?, ?it/s]

  0%|          | 0/18 [00:00<?, ?it/s]

Comparing & Vizualising the results

compare([scorecard for scorecard, _ in results], sort_by="rmse")
rmse mae mape smape mse r_two residuals_mean residuals_std
prophet 121.569502 95.075627 7.955506e+13 0.509200 14779.143772 0.426842 6.361641 121.423231
mstl 118.703683 68.464305 1.846641e+14 0.279174 14090.564243 0.453546 13.258315 117.980649
univariate_ensemble 105.962271 73.283533 1.321096e+14 0.356740 11228.002822 0.564561 9.809978 105.524826
naive 63.347800 39.110541 2.122351e+14 0.203802 4012.943794 0.844372 -0.043877 63.358374
tabular_tree 51.357493 29.652915 2.123209e+14 0.159151 2637.592134 0.897710 -1.747140 51.336346
tabular_ensemble 48.228591 28.165175 2.113763e+14 0.153255 2325.997006 0.909794 -2.690336 48.161544

In this simplistic, unfair comparison, it looks like the tabular models (and the Naive baseline) that have access to the previous value (and the exogenous variables) outperform the univariate models that are only re-trained every week.

We can't really draw general conclusions from this work, though.

Unlike NLP and Computer vision, Time Series data is very heterogeneous, and a Machine Learning approach that works well for one series may be an inferior choice for your specific usecase.

But now we have an easy way to compare the different pipelines, with unprecedented speed, by using a unified interface, with fold.

all_predictions =[predictions.squeeze().rename(scorecard.metadata.model_name) for scorecard, predictions in results]

plot_y_predictions(y[predictions.index], all_predictions, y_name="residual_load", mode='seperate')


Want to know more? Visit fold's Examples page, and access all the necessary snippets you need for you to build a Time Series ML pipeline!