Composites
columns
EnsembleEachColumn
Bases: Composite
Train a pipeline for each column in the data, then ensemble their results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline |
Pipeline
|
Pipeline that get applied to every column, independently, their results then averaged. |
required |
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import EnsembleEachColumn
>>> from sklearn.ensemble import RandomForestRegressor
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = EnsembleEachColumn(RandomForestRegressor())
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
SkipNA
Bases: Composite
Skips rows with NaN values in the input data. In the output, rows with NaNs are returned as is, all other rows transformed.
Warning: This seriously challenges the continuity of the data, which is very important for traditional time series models. Use with caution, and only with tabular ML models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline |
Pipeline
|
Pipeline to run without NA values. |
required |
Examples:
>>> from fold.loop import train_backtest
>>> import numpy as np
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import ModelResiduals
>>> from sklearn.ensemble import RandomForestClassifier
>>> from imblearn.under_sampling import RandomUnderSampler
>>> from fold.utils.tests import generate_zeros_and_ones
>>> X, y = generate_zeros_and_ones()
>>> X[1:100] = np.nan
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = SkipNA(
... pipeline=RandomForestClassifier(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
TransformEachColumn
Bases: Composite
Apply a single pipeline to each column, separately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline |
Pipeline
|
Pipeline that gets applied to each column |
required |
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import TransformEachColumn
>>> from sklearn.ensemble import RandomForestRegressor
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> X["sine_plus_1"] = X["sine"] + 1.0
>>> X.head()
sine sine_plus_1
2021-12-31 07:20:00 0.0000 1.0000
2021-12-31 07:21:00 0.0126 1.0126
2021-12-31 07:22:00 0.0251 1.0251
2021-12-31 07:23:00 0.0377 1.0377
2021-12-31 07:24:00 0.0502 1.0502
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = TransformEachColumn(lambda x: x + 1.0)
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
sine sine_plus_1
2021-12-31 15:40:00 1.0000 2.0000
2021-12-31 15:41:00 1.0126 2.0126
2021-12-31 15:42:00 1.0251 2.0251
2021-12-31 15:43:00 1.0377 2.0377
2021-12-31 15:44:00 1.0502 2.0502
concat
Concat
Bases: Composite
Concatenates the results of multiple pipelines.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipelines |
Pipelines
|
A list of pipelines to be applied to the data, independently of each other. |
required |
if_duplicate_keep |
ResolutionStrategy | str | None
|
How to handle duplicate columns, by default ResolutionStrategy.first |
first
|
custom_merge_logic |
Callable[[list[DataFrame]], None] | DataFrame | None
|
A custom function that takes a list of dataframes and returns a single dataframe. If present, it's used instead of ResolutionStrategy. |
None
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import Concat
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = Concat([
... lambda X: X.assign(sine_plus_1=X["sine"] + 1),
... lambda X: X.assign(sine_plus_2=X["sine"] + 2),
... ])
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
sine_plus_1 sine_plus_2 sine
2021-12-31 15:40:00 1.0000 2.0000 -0.0000
2021-12-31 15:41:00 1.0126 2.0126 0.0126
2021-12-31 15:42:00 1.0251 2.0251 0.0251
2021-12-31 15:43:00 1.0377 2.0377 0.0377
2021-12-31 15:44:00 1.0502 2.0502 0.0502
Sequence
Bases: Composite
An optional wrappers that is equivalent to using a single array for the transformations. It executes the transformations sequentially, in the order they are provided.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline |
Pipelines
|
A list of transformations or models to be applied to the data. |
required |
ensemble
Ensemble
Bases: Composite
Ensemble (average) the results of multiple pipelines.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipelines |
Pipelines
|
A list of pipelines to be applied to the data, independently of each other. |
required |
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import Ensemble
>>> from fold.models import DummyRegressor
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = Ensemble([
... DummyRegressor(0.1),
... DummyRegressor(0.9),
... ])
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.squeeze().head()
2021-12-31 15:40:00 0.5
2021-12-31 15:41:00 0.5
2021-12-31 15:42:00 0.5
2021-12-31 15:43:00 0.5
2021-12-31 15:44:00 0.5
Freq: T, Name: predictions_Ensemble-DummyRegressor-0.1-DummyRegressor-0.9, dtype: float64
metalabeling
MetaLabeling
Bases: Composite
MetaLabeling takes a primary pipeline and a meta pipeline. The primary pipeline is used to predict the target variable. The meta pipeline is used to predict whether the primary model's prediction's are correct (a binary classification problem). It multiplies the probabilities from the meta pipeline with the predictions of the primary pipeline.
It's only applicable for binary classification problems, where the labels are either 1
, -1
or one of them are zero.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
primary |
Pipeline
|
A pipeline to be applied to the data. Target ( |
required |
meta |
Pipeline
|
A pipeline to be applied to predict whether the primary pipeline's predictions are correct. Target ( |
required |
primary_output_included |
bool
|
Whether the primary pipeline's output is included in the meta pipeline's input, by default False. |
False
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SingleWindowSplitter
>>> from fold.composites import MetaLabeling
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.linear_model import LogisticRegression
>>> from fold.utils.tests import generate_zeros_and_ones
>>> X, y = generate_zeros_and_ones()
>>> splitter = SingleWindowSplitter(train_window=0.5)
>>> pipeline = MetaLabeling(
... primary=LogisticRegression(),
... meta=RandomForestClassifier(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
Outputs
A prediction is a float between -1 or 0, and 1.
It does not output probabilities, as the prediction already includes that information.
metalabeling_strategy
residual
ModelResiduals
Bases: Composite
This is a composite that combines two pipelines: * The primary pipeline is used to predict the target variable. * The meta pipeline is used to predict the primary pipeline's residual (or, error).
It adds together the primary pipeline's output with the predicted residual.
Also known as: - Residual chasing - Residual boosting - Hybrid approach - "Moving Average" in ARIMA
It's only applicable for regression tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
primary |
Pipeline
|
A pipeline to be applied to the data. The target ( |
required |
meta |
Pipeline
|
A pipeline to predict the primary pipeline's residual. The target ( |
required |
primary_output_included |
bool
|
Whether the primary pipeline's output is included in the meta pipeline's input, by default False. |
False
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import ModelResiduals
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.linear_model import LinearRegression
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = ModelResiduals(
... primary=LinearRegression(),
... meta=RandomForestRegressor(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
References
- https://www.kaggle.com/code/ryanholbrook/hybrid-models
- https://www.uber.com/en-DE/blog/m4-forecasting-competition/
sample
Sample
Bases: Sampler
Sample data with an imbalanced-learn sampler instance during training. No sampling is done during inference or backtesting.
Warning: This seriously challenges the continuity of the data, which is very important for traditional time series models. Use with caution, and only with tabular ML models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sampler |
Any
|
An imbalanced-learn sampler instance (subclass of |
required |
pipeline |
Pipeline
|
A pipeline to be applied to the sampled data. |
required |
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import ModelResiduals
>>> from sklearn.ensemble import RandomForestClassifier
>>> from imblearn.under_sampling import RandomUnderSampler
>>> from fold.utils.tests import generate_zeros_and_ones_skewed
>>> X, y = generate_zeros_and_ones_skewed()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = Sample(
... sampler=RandomUnderSampler(),
... pipeline=RandomForestClassifier(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
References
selectbest
target
TransformTarget
Bases: Composite
Transforms the target within the context of the wrapped Pipeline.
wrapped_pipeline
will be applied to the input data, where the target (y
) is already transformed.
y_pipeline
will be applied to the target column.
The inverse of y_transformation
will be applied to the predictions of the primary pipeline.
Eg.: Log or Difference transformation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
wrapped_pipeline |
Pipeline
|
Pipeline, which will be applied to the input data, where the target ( |
required |
y_pipeline |
list[InvertibleTransformation] | InvertibleTransformation
|
InvertibleTransformations, which will be applied to the target ( |
required |
invert_wrapped_output |
bool
|
Apply the inverse transformation of |
True
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import ModelResiduals
>>> from sklearn.linear_model import LinearRegression
>>> from fold.transformations import Difference
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = TransformTarget(
... wrapped_pipeline=LinearRegression(),
... y_pipeline=Difference(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)