Fold - Core Walkthrough

Welcome 👋

In this notebook we'll demonstrate fold's powerful interface for creating, training, and cross-validating (or backtesting, if you prefer) simple and composite models/pipelines.

We will use the dataset from an Energy residual load forcasting challenge hosted on Kaggle.

By the end you will know how to: - Create a simple and ensemble model (composite model) - Train multiple models / pipelines over time - Analyze the model's simulated past performance

Let's start by installing: - fold - fold-wrappers: optional, this will be required later for third party models. Wraps eg. XGBoost or StatsForecast models to be used with fold. - krisi, optional. Dream Faster's Time-Series evaluation library to quickly get results.

Installing libraries

%%capture
pip install --quiet fold-core fold-wrappers fold-models krisi matplotlib seaborn xgboost plotly prophet statsforecast statsmodels ray kaleido

Data Loading and Exploration

Let's load in the data and do minimal exploration of the structure of the data.

fold has a useful utility function that loads example data from our datasets GitHub repo.

We are forecasting residual_load‡.
We will shorten the dataset to 4000 rows so we have a speedier demonstration.

‡ The difference between the load in the network and the P that the industrial complex is producing.

from fold.utils.dataset import get_preprocessed_dataset
from statsmodels.graphics.tsaplots import plot_acf
from krisi import score, compare
from krisi.report import plot_y_predictions
import plotly.io as pio
pio.renderers.default = "png"

X, y = get_preprocessed_dataset(
    "energy/industrial_pv_load",
    target_col="residual_load", 
    resample="H",
    deduplication_strategy="first",
    shorten=4000,
)
no_of_observation_per_day = 24
no_of_observation_per_week = no_of_observation_per_day * 7

y.plot(figsize = (20,5), grid=True);

The data format may be very familiar - it looks like the standard scikit-learn data.

X represents exogenous variables in fold, where a single row corresponds to a single target value. That means we currently only support univariate time-series (with exogenous variables), but soon we're extending that.

It's important that the data should be sorted and its integrity (no missing values, no duplicate indicies) should be checked before passing data to fold.

X.head()

	P	Gb(i)	Gd(i)	H_sun	T2m	WS10m	load	residual_load
datetime
2018-01-01 00:00:00	0.0	0.0	0.0	0.0	8.44	5.54	120.0	120.0
2018-01-01 01:00:00	0.0	0.0	0.0	0.0	7.56	5.43	115.5	115.5
2018-01-01 02:00:00	0.0	0.0	0.0	0.0	7.04	5.33	120.5	120.5
2018-01-01 03:00:00	0.0	0.0	0.0	0.0	6.48	5.67	123.5	123.5
2018-01-01 04:00:00	0.0	0.0	0.0	0.0	5.95	5.79	136.5	136.5

(We'll ignore the exogenous variables until a bit later)

y.head()

datetime
2018-01-01 00:00:00    115.5
2018-01-01 01:00:00    120.5
2018-01-01 02:00:00    123.5
2018-01-01 03:00:00    136.5
2018-01-01 04:00:00    138.0
Freq: H, Name: residual_load, dtype: float64

You can see that y (our target) contains the next value of X's "residual_load" column.

Time Series Cross Validation with a univariate forecaster

1. Model Building

fold has three core type of building blocks which you can build arbitrary sophisticated pipelines from: - Transformations (classes that change, augment the data. eg: AddHolidayFeatures adds a column feature of holidays/weekends to your exogenous variables) - Models (eg.: Sklearn, Baseline Models, third-party adapters from fold-wrappers, like Statsmodels) - Composites (eg.: Ensemble - takes the mean of the output of arbitrary number of 'parallel' models or pipelines)

Let's use Facebook's popular Prophet library, and create in instance.

If fold-wrappers is installed, fold can take this instance without any additional wrapper class.

from prophet import Prophet
prophet = Prophet()

2. Creating a Splitter

A splitter allows us to do Time Series Cross-Validation with various strategies.

fold supports three types of Splitters:

from fold.splitters import ExpandingWindowSplitter

splitter = ExpandingWindowSplitter(
    initial_train_window=no_of_observation_per_week * 6,
    step=no_of_observation_per_week
)

Here, initial_train_window defines the first window size, step is the size of the window between folds.

We're gonna be using the first 6 weeks as our initial window, and re-train (or update, in another training mode) it every week after. We'll have 18 models, each predicting the next week's target variable.

You can also use percentages to define both, for example, 0.1 would be equivalent to 10% of the availabel data.

3. Training a (univariate) Model

We could use ray to parallelize the training of multiple folds, halving the time it takes for every CPU core we have available (or deploying it to a cluster, if needed).

We pass in None as X, to indicate that we want to train a univariate model, without any exogenous variables.

from fold import train_evaluate, BackendType
import ray
ray.init(ignore_reinit_error=True)

scorecard, predictions, trained_pipeline = train_evaluate(prophet, None, y, splitter, backend=BackendType.ray, krisi_args={"model_name":"prophet"})

4. Evaluating the results

scorecard.print('minimal')

                                               prophet
                     Mean Absolute Error  9.507563e+01
          Mean Absolute Percentage Error  7.955506e+13
Symmetric Mean Absolute Percentage Error  5.092002e-01
                      Mean Squared Error  1.477914e+04
                 Root Mean Squared Error  1.215695e+02
                               R-squared  4.268419e-01
                   Mean of the Residuals  6.361641e+00
     Standard Deviation of the Residuals  1.214232e+02

plot_y_predictions(y[predictions.index], predictions, mode="overlap", y_name="residual_load")

Finally, let's save the scorecard into a list, so we can compare the results later.

results = [(scorecard, predictions)]

Using an Ensemble (Composite) model

Here we will build an Ensemble model that leverages the output of multiple models.

1. Model Building with `fold-wrappers`

We are going to define three different pipelines, each leveraging a different model and different features.

We can leverage the most popular modelling libraries, like StatsForecast, Sktime, XGBoost, etc. (the list can be found here).

Let's train a MSTL model that's implemented in StatsForecast, that can capture multiple seasonalities, with the WrapStatsForecast class from fold-wrappers. This is not strictly necessary, though, as the automatic wrapping also works for StatsForecast instaces as well.

from statsforecast.models import MSTL
from fold.models.wrappers import WrapStatsForecast, WrapStatsModels

mstl = WrapStatsForecast.from_model(MSTL([24, 168]))

2. Ensembling with `fold`

Finally, let's ensemble the two pipelines.

from fold.composites import Ensemble

univariate_ensemble = Ensemble([prophet, mstl])

3. Training all pipelines seperately and within an `ensemble`

We'll use the same ExpandingWindowSplitter we have defined above, to make performance comparable.

from fold import train_evaluate

for name, pipeline in [
    ("mstl", mstl),
    ("univariate_ensemble",univariate_ensemble)
]:
    scorecard, predictions, pipeline_trained = train_evaluate(pipeline, None, y, splitter, krisi_args={"model_name":name})
    results.append((scorecard, predictions))

compare([scorecard for scorecard, predictions in results], ['rmse', 'mse'])

	rmse	mse
prophet	121.569502	14779.143772
mstl	118.703683	14090.564243
univariate_ensemble	105.962271	11228.002822

We see that our Ensemble model has beaten all individual models' performance - which is very usual in the time series context.

Using a single-step ahead forecaster (a baseline)

So far we've used models that were costly to update (or re-train) every day, therefore we were limited to training once for every week, then predicting the next week's target.

What if we could use a lightweight, "online" model, that can be updated on every timestamp?

And.. what if we just repeat the last value?

That'd be the Naive model you can load from fold.models.

from fold import train_evaluate
from fold.models import Naive

scorecard, predictions, trained_pipeline = train_evaluate(Naive(), None, y, splitter, krisi_args={"model_name":"naive"})
results.append((scorecard, predictions))
scorecard.print("minimal")

  0%|          | 0/18 [00:00<?, ?it/s]



  0%|          | 0/18 [00:00<?, ?it/s]

                                                 naive
                     Mean Absolute Error  3.911054e+01
          Mean Absolute Percentage Error  2.122351e+14
Symmetric Mean Absolute Percentage Error  2.038015e-01
                      Mean Squared Error  4.012944e+03
                 Root Mean Squared Error  6.334780e+01
                               R-squared  8.443718e-01
                   Mean of the Residuals -4.387701e-02
     Standard Deviation of the Residuals  6.335837e+01

We call this Adaptive Backtesting.

It looks like having access to last value really makes a difference: the baseline model beats all long-term forecasting models by a large margin.

It's extremely important to define our forecasting task well: 1. We need to think about what time horizon can and should forecast 2. And how frequently can we update our models.

Long-horizon (in this case, a week ahead) forecasts can be very unreliable, on the other hand, frequent, short-term forecasts are where Machine Learning shines (as we'll see in the next section).

Using exogenous variables with Tabular Models

So far we have been training univariate models, and ignored all the additional, exogenous variables that come with our data.

Let's try whether using this data boost our model's performance!

Building Models separately

We'll be using scikit-learn's HistGradientBoostingRegressor, their competing implementation of Gradient Boosted Trees. You don't need to wrap scikit-learn models or transformations when using it in fold, just pass it in directly to any pipeline.

from sklearn.ensemble import HistGradientBoostingRegressor

tree_model = HistGradientBoostingRegressor(max_depth=10)

Let's add both holiday and date-time features to our previous ensemble pipeline.

The data was gathered in the Region of Hessen, Germany -- so we pass in DE (we can pass in multiple regions). This transformation adds another column for holidays to our exogenous (X) features.

We're also adding the current hour, and day of week as integers to our exogenous features. This is one of the ways for our tabular model to capture seasonality.

from fold.transformations import AddHolidayFeatures, AddDateTimeFeatures

datetime = AddDateTimeFeatures(['hour', 'day_of_week', 'day_of_year'])
holidays = AddHolidayFeatures(['DE'])

Let's add a couple of lagged, exogenous values for our model. AddLagsX receives a tuple of column name and integer or list of lags, for each of which it will create a column in X.

We can easily create transformations of existing features on a rolling window basis with AddWindowFeatures as well, in this case, the last day's average value for all of our exogenous features.

We can "tie in" two separate pipelines with Concat, which concatenates all columns from all sources.

from fold.transformations import AddWindowFeatures, AddLagsX
from fold.composites import Concat

tree = [
    Concat([
        AddLagsX(("all",range(1,3))),
        AddWindowFeatures([("all", 24, "mean")]),
    ]),
    datetime,
    holidays,
    tree_model
]

Let's see how this performs!

We can also use fold's train, backtest to decouple these functionalities.

from fold import train, backtest

trained_pipeline = train(tree, X, y, splitter)
predictions = backtest(trained_pipeline, X, y, splitter)
scorecard = score(y[predictions.index], predictions.squeeze(), model_name="tabular_tree")

results.append((scorecard, predictions))
scorecard.print("minimal")

  0%|          | 0/18 [00:00<?, ?it/s]



  0%|          | 0/18 [00:00<?, ?it/s]

                                          tabular_tree
                     Mean Absolute Error  2.965292e+01
          Mean Absolute Percentage Error  2.123209e+14
Symmetric Mean Absolute Percentage Error  1.591514e-01
                      Mean Squared Error  2.637592e+03
                 Root Mean Squared Error  5.135749e+01
                               R-squared  8.977101e-01
                   Mean of the Residuals -1.747140e+00
     Standard Deviation of the Residuals  5.133635e+01

Creating an Ensemble of Tabular models

First let's creat two more models: * an Sklearn LinearRegressor * and an XGBoostRegressor instance

We are also going to use the HistGradientBoostingRegressor pipeline that we defined prior.

from sklearn.linear_model import LinearRegression

lregression = [
    AddLagsX(('all',range(1,3))),
    datetime,
    LinearRegression()
]

from xgboost import XGBRegressor
from fold.models.wrappers.gbd import WrapXGB

xgboost = [
    AddLagsX(('all',range(1,3))),
    datetime,
    WrapXGB.from_model(XGBRegressor())
]

tabular_ensemble = Ensemble([lregression, xgboost, tree])

scorecard, predictions, pipeline_trained = train_evaluate(tabular_ensemble, X, y, splitter, krisi_args={"model_name":"tabular_ensemble"})
results.append((scorecard, predictions))

  0%|          | 0/18 [00:00<?, ?it/s]



  0%|          | 0/18 [00:00<?, ?it/s]

Comparing & Vizualising the results

compare([scorecard for scorecard, _ in results], sort_by="rmse")

	rmse	mae	mape	smape	mse	r_two	residuals_mean	residuals_std
prophet	121.569502	95.075627	7.955506e+13	0.509200	14779.143772	0.426842	6.361641	121.423231
mstl	118.703683	68.464305	1.846641e+14	0.279174	14090.564243	0.453546	13.258315	117.980649
univariate_ensemble	105.962271	73.283533	1.321096e+14	0.356740	11228.002822	0.564561	9.809978	105.524826
naive	63.347800	39.110541	2.122351e+14	0.203802	4012.943794	0.844372	-0.043877	63.358374
tabular_tree	51.357493	29.652915	2.123209e+14	0.159151	2637.592134	0.897710	-1.747140	51.336346
tabular_ensemble	48.228591	28.165175	2.113763e+14	0.153255	2325.997006	0.909794	-2.690336	48.161544

In this simplistic, unfair comparison, it looks like the tabular models (and the Naive baseline) that have access to the previous value (and the exogenous variables) outperform the univariate models that are only re-trained every week.

We can't really draw general conclusions from this work, though.

Unlike NLP and Computer vision, Time Series data is very heterogeneous, and a Machine Learning approach that works well for one series may be an inferior choice for your specific usecase.

But now we have an easy way to compare the different pipelines, with unprecedented speed, by using a unified interface, with fold.

all_predictions =[predictions.squeeze().rename(scorecard.metadata.model_name) for scorecard, predictions in results]

plot_y_predictions(y[predictions.index], all_predictions, y_name="residual_load", mode='seperate')

Want to know more? Visit fold's Examples page, and access all the necessary snippets you need for you to build a Time Series ML pipeline!