Skip to content

Composites

columns

EnsembleEachColumn

Bases: Composite

Train a pipeline for each column in the data, then ensemble their results.

Parameters:

Name Type Description Default
pipeline Pipeline

Pipeline that get applied to every column, independently, their results then averaged.

required

Examples:

>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import EnsembleEachColumn
>>> from sklearn.ensemble import RandomForestRegressor
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y  = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = EnsembleEachColumn(RandomForestRegressor())
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)

SkipNA

Bases: Composite

Skips rows with NaN values in the input data. In the output, rows with NaNs are returned as is, all other rows transformed.

Warning: This seriously challenges the continuity of the data, which is very important for traditional time series models. Use with caution, and only with tabular ML models.

Parameters:

Name Type Description Default
pipeline Pipeline

Pipeline to run without NA values.

required

Examples:

>>> from fold.loop import train_backtest
>>> import numpy as np
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import ModelResiduals
>>> from sklearn.ensemble import RandomForestClassifier
>>> from imblearn.under_sampling import RandomUnderSampler
>>> from fold.utils.tests import generate_zeros_and_ones
>>> X, y  = generate_zeros_and_ones()
>>> X[1:100] = np.nan
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = SkipNA(
...     pipeline=RandomForestClassifier(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)

TransformEachColumn

Bases: Composite

Apply a single pipeline to each column, separately.

Parameters:

Name Type Description Default
pipeline Pipeline

Pipeline that gets applied to each column

required

Examples:

>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import TransformEachColumn
>>> from sklearn.ensemble import RandomForestRegressor
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y  = generate_sine_wave_data()
>>> X["sine_plus_1"] = X["sine"] + 1.0
>>> X.head()
                       sine  sine_plus_1
2021-12-31 07:20:00  0.0000       1.0000
2021-12-31 07:21:00  0.0126       1.0126
2021-12-31 07:22:00  0.0251       1.0251
2021-12-31 07:23:00  0.0377       1.0377
2021-12-31 07:24:00  0.0502       1.0502
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = TransformEachColumn(lambda x: x + 1.0)
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
                       sine  sine_plus_1
2021-12-31 15:40:00  1.0000       2.0000
2021-12-31 15:41:00  1.0126       2.0126
2021-12-31 15:42:00  1.0251       2.0251
2021-12-31 15:43:00  1.0377       2.0377
2021-12-31 15:44:00  1.0502       2.0502

concat

Concat

Bases: Composite

Concatenates the results of multiple pipelines.

Parameters:

Name Type Description Default
pipelines Pipelines

A list of pipelines to be applied to the data, independently of each other.

required
if_duplicate_keep ResolutionStrategy | str | None

How to handle duplicate columns, by default ResolutionStrategy.first

first
custom_merge_logic Callable[[list[DataFrame]], None] | DataFrame | None

A custom function that takes a list of dataframes and returns a single dataframe. If present, it's used instead of ResolutionStrategy.

None

Examples:

>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import Concat
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y  = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = Concat([
...     lambda X: X.assign(sine_plus_1=X["sine"] + 1),
...     lambda X: X.assign(sine_plus_2=X["sine"] + 2),
... ])
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
                     sine_plus_1  sine_plus_2    sine
2021-12-31 15:40:00       1.0000       2.0000 -0.0000
2021-12-31 15:41:00       1.0126       2.0126  0.0126
2021-12-31 15:42:00       1.0251       2.0251  0.0251
2021-12-31 15:43:00       1.0377       2.0377  0.0377
2021-12-31 15:44:00       1.0502       2.0502  0.0502

Sequence

Bases: Composite

An optional wrappers that is equivalent to using a single array for the transformations. It executes the transformations sequentially, in the order they are provided.

Parameters:

Name Type Description Default
pipeline Pipelines

A list of transformations or models to be applied to the data.

required

TransformColumn

TransformColumn(columns: list[str] | str, pipeline: Pipeline, name: str | None = None) -> Composite

Transforms a single or multiple columns using the given pipeline.

ensemble

Ensemble

Bases: Composite

Ensemble (average) the results of multiple pipelines.

Parameters:

Name Type Description Default
pipelines Pipelines

A list of pipelines to be applied to the data, independently of each other.

required

Examples:

>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import Ensemble
>>> from fold.models import DummyRegressor
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y  = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = Ensemble([
...     DummyRegressor(0.1),
...     DummyRegressor(0.9),
... ])
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.squeeze().head()
2021-12-31 15:40:00    0.5
2021-12-31 15:41:00    0.5
2021-12-31 15:42:00    0.5
2021-12-31 15:43:00    0.5
2021-12-31 15:44:00    0.5
Freq: T, Name: predictions_Ensemble-DummyRegressor-0.1-DummyRegressor-0.9, dtype: float64

metalabeling

MetaLabeling

Bases: Composite

MetaLabeling takes a primary pipeline and a meta pipeline. The primary pipeline is used to predict the target variable. The meta pipeline is used to predict whether the primary model's prediction's are correct (a binary classification problem). It multiplies the probabilities from the meta pipeline with the predictions of the primary pipeline.

It's only applicable for binary classification problems, where the labels are either 1, -1 or one of them are zero.

Parameters:

Name Type Description Default
primary Pipeline

A pipeline to be applied to the data. Target (y) is unchanged.

required
meta Pipeline

A pipeline to be applied to predict whether the primary pipeline's predictions are correct. Target (y) is preds == y.

required
primary_output_included bool

Whether the primary pipeline's output is included in the meta pipeline's input, by default False.

False

Examples:

>>> from fold.loop import train_backtest
>>> from fold.splitters import SingleWindowSplitter
>>> from fold.composites import MetaLabeling
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.linear_model import LogisticRegression
>>> from fold.utils.tests import generate_zeros_and_ones
>>> X, y  = generate_zeros_and_ones()
>>> splitter = SingleWindowSplitter(train_window=0.5)
>>> pipeline = MetaLabeling(
...     primary=LogisticRegression(),
...     meta=RandomForestClassifier(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
Outputs
A prediction is a float between -1 or 0, and 1.
It does not output probabilities, as the prediction already includes that information.
References

Meta Labeling (A Toy Example) Meta-Labeling: Theory and Framework

metalabeling_strategy

residual

ModelResiduals

Bases: Composite

This is a composite that combines two pipelines: * The primary pipeline is used to predict the target variable. * The meta pipeline is used to predict the primary pipeline's residual (or, error).

It adds together the primary pipeline's output with the predicted residual.

Also known as: - Residual chasing - Residual boosting - Hybrid approach - "Moving Average" in ARIMA

It's only applicable for regression tasks.

Parameters:

Name Type Description Default
primary Pipeline

A pipeline to be applied to the data. The target (y) is unchanged.

required
meta Pipeline

A pipeline to predict the primary pipeline's residual. The target (y) is the primary pipeline's residual (or, error).

required
primary_output_included bool

Whether the primary pipeline's output is included in the meta pipeline's input, by default False.

False

Examples:

>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import ModelResiduals
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.linear_model import LinearRegression
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y  = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = ModelResiduals(
...     primary=LinearRegression(),
...     meta=RandomForestRegressor(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
References
  • https://www.kaggle.com/code/ryanholbrook/hybrid-models
  • https://www.uber.com/en-DE/blog/m4-forecasting-competition/

sample

Sample

Bases: Sampler

Sample data with an imbalanced-learn sampler instance during training. No sampling is done during inference or backtesting.

Warning: This seriously challenges the continuity of the data, which is very important for traditional time series models. Use with caution, and only with tabular ML models.

Parameters:

Name Type Description Default
sampler Any

An imbalanced-learn sampler instance (subclass of BaseSampler).

required
pipeline Pipeline

A pipeline to be applied to the sampled data.

required

Examples:

>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import ModelResiduals
>>> from sklearn.ensemble import RandomForestClassifier
>>> from imblearn.under_sampling import RandomUnderSampler
>>> from fold.utils.tests import generate_zeros_and_ones_skewed
>>> X, y  = generate_zeros_and_ones_skewed()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = Sample(
...     sampler=RandomUnderSampler(),
...     pipeline=RandomForestClassifier(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
References

imbalanced-learn

selectbest

target

TransformTarget

Bases: Composite

Transforms the target within the context of the wrapped Pipeline. wrapped_pipeline will be applied to the input data, where the target (y) is already transformed. y_pipeline will be applied to the target column.

The inverse of y_transformation will be applied to the predictions of the primary pipeline.

Eg.: Log or Difference transformation.

Parameters:

Name Type Description Default
wrapped_pipeline Pipeline

Pipeline, which will be applied to the input data, where the target (y) is already transformed.

required
y_pipeline list[InvertibleTransformation] | InvertibleTransformation

InvertibleTransformations, which will be applied to the target (y)

required
invert_wrapped_output bool

Apply the inverse transformation of y_pipeline to the output of wrapped_pipeline. default is True.

True

Examples:

>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.composites import ModelResiduals
>>> from sklearn.linear_model import LinearRegression
>>> from fold.transformations import Difference
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y  = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = TransformTarget(
...     wrapped_pipeline=LinearRegression(),
...     y_pipeline=Difference(),
... )
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)

utils