Models
Model types (Time series / Tabular)
Fold
fundamentally supports both:
"Time series" models
The likes of ARIMA, RNNs, Exponential Smoothing, etc.
Their univariate variations only have access to y
, and ignore all data in X
.
They're usually designed to be effective without additional feature engineering.
Examples:
... provided in fold-wrappers.
Tabular ML models
The likes of Random Forests, Gradient Boosted Trees, Linear Regression, etc.
They depend on having X
populated, and do not work as "univariate" models.
Each row in X
corresponds to a single dependent variable, in y
.
Usually, you may want to add lagged values of y
with the [AddLagsY][fold.transformations.lags.AddLagsY] class, or create other features for the tabular models with:
- AddLagsX: if you have exogenous data already.
- AddWindowFeatures: if you have exogenous data already, and you want to aggregate them across different windows.
Examples:
... provided in fold-wrappers.
Check out the Examples gallery to see how easy it is to engineer features with fold
.
Online and Mini-batch Learning Modes
A mini-batch model is retrained for every split the Splitter returns.
It can not update its state within a test window, but it may depend on lagged values of X
or y
.
An online
model, on the other hand is updated after inference on each timestamp.
Except for "in sample" predictions, which is done in a batch manner, with predict_in_sample()
We also give our "online" models a way to access the latest values and skip the step that'd update their parameters. This enables an efficient "quasi-online" behaviour, where the model is only re-trained (or, updated) once per fold, but can "follow" the time series data - which usually comes with signifcant increase in accuracy.
Baselines
As Time Series is a fundamentally hard problem, it's also important to use strong baselines, which we have our own, fast implementations, in fold-models.