Transformations
columns
AddColumnSuffix
Bases: Transformation
Add suffix to column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
suffix |
str
|
Suffix to add to all column names. |
required |
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import RenameColumns
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = AddColumnSuffix("_2")
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
sine_2
2021-12-31 15:40:00 -0.0000
2021-12-31 15:41:00 0.0126
2021-12-31 15:42:00 0.0251
2021-12-31 15:43:00 0.0377
2021-12-31 15:44:00 0.0502
DropColumns
Bases: Transformation
, Tunable
Drops a single or multiple columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
(list[str], str)
|
The column or columns to drop. |
required |
OnlyPredictions
Bases: Transformation
Drops all columns except the output model(s)' predictions.
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import OnlyProbabilities
>>> from fold.models.dummy import DummyClassifier
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = [DummyClassifier(1, [0, 1], [0.5, 0.5]), OnlyPredictions()]
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
predictions_DummyClassifier
2021-12-31 15:40:00 1
2021-12-31 15:41:00 1
2021-12-31 15:42:00 1
2021-12-31 15:43:00 1
2021-12-31 15:44:00 1
OnlyProbabilities
Bases: Transformation
Drops all columns except the output model(s)' probabilities.
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import OnlyProbabilities
>>> from fold.models.dummy import DummyClassifier
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = [DummyClassifier(1, [0, 1], [0.5, 0.5]), OnlyProbabilities()]
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
probabilities_DummyClassifier_0 probabilities_DummyClassifier_1
2021-12-31 15:40:00 0.5 0.5
2021-12-31 15:41:00 0.5 0.5
2021-12-31 15:42:00 0.5 0.5
2021-12-31 15:43:00 0.5 0.5
2021-12-31 15:44:00 0.5 0.5
RenameColumns
Bases: Transformation
Renames columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns_mapper |
dict
|
A dictionary containing the old column names as keys and the new column names as values. |
required |
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import RenameColumns
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = RenameColumns({"sine": "sine_renamed"})
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
sine_renamed
2021-12-31 15:40:00 -0.0000
2021-12-31 15:41:00 0.0126
2021-12-31 15:42:00 0.0251
2021-12-31 15:43:00 0.0377
2021-12-31 15:44:00 0.0502
SelectColumns
Bases: Transformation
, Tunable
Selects a single or multiple columns, drops the rest.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
Union[list[str], str]
|
The column or columns to select (dropping the rest). |
required |
dev
Breakpoint
difference
Difference
Bases: Transformation
, Tunable
Takes the returns (percentage change between the current and a prior element).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
log_returns |
bool
|
If True, computes the log returns instead of the simple returns, default False. |
False.
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import Difference
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data(freq="min")
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = Difference()
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> X["sine"].loc[preds.index].head()
2021-12-31 15:40:00 -0.0000
2021-12-31 15:41:00 0.0126
2021-12-31 15:42:00 0.0251
2021-12-31 15:43:00 0.0377
2021-12-31 15:44:00 0.0502
Freq: T, Name: sine, dtype: float64
>>> preds["sine"].head()
2021-12-31 15:40:00 -1.000000
2021-12-31 15:41:00 -inf
2021-12-31 15:42:00 0.992063
2021-12-31 15:43:00 0.501992
2021-12-31 15:44:00 0.331565
Freq: T, Name: sine, dtype: float64
features
AddFeatures
Bases: Transformation
, Tunable
Applies a function to one or more columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_func |
ColumnFunction | list[ColumnFunction]
|
A tuple of a column or list of columns and a function to apply to them. |
required |
fillna |
bool
|
Fill NaNs in the resulting DataFrame |
False
|
name |
str | None
|
Name of the transformation. |
None
|
params_to_try |
dict | None
|
Dictionary of parameters to try when tuning. |
None
|
Returns:
Type | Description |
---|---|
tuple[pd.DataFrame, Artifact| None]: returns the transformed DataFrame with the original dataframe concatinated.
|
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import AddFeatures
>>> from fold.models.dummy import DummyClassifier
>>> from fold.utils.tests import generate_sine_wave_data
>>> import numpy as np
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = AddFeatures([("sine", np.square)])
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
sine sine~square
2021-12-31 15:40:00 -0.0000 0.000000
2021-12-31 15:41:00 0.0126 0.000159
2021-12-31 15:42:00 0.0251 0.000630
2021-12-31 15:43:00 0.0377 0.001421
2021-12-31 15:44:00 0.0502 0.002520
AddWindowFeatures
Bases: Transformation
, Tunable
Creates rolling window features on the specified columns.
Equivalent to adding a new column by running: df[column].rolling(window).function()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_window_func |
(ColumnWindowFunction, list[ColumnWindowFunction])
|
A list of tuples, where each tuple contains the column name, the window size and the function to apply. The function can be a predefined function (see PredefinedFunction) or a Callable (with a single parameter). |
required |
fillna |
bool
|
Fill NaNs in the resulting DataFrame |
False
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import AddWindowFeatures
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = AddWindowFeatures(("sine", 10, "mean"))
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
sine sine~mean_10
2021-12-31 15:40:00 -0.0000 -0.05649
2021-12-31 15:41:00 0.0126 -0.04394
2021-12-31 15:42:00 0.0251 -0.03139
2021-12-31 15:43:00 0.0377 -0.01883
2021-12-31 15:44:00 0.0502 -0.00628
function
ApplyFunction
holidays
AddExchangeHolidayFeatures
Bases: Transformation
, Tunable
Adds holiday features for given exchange(s) as new column(s). It uses the pattern "holiday_{exchange}" for naming the columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exchange_codes |
list[str] | str
|
List of exchange codes (eg.: |
required |
labeling |
str | LabelingMethod
|
|
weekday_weekend_holiday
|
AddHolidayFeatures
Bases: Transformation
, Tunable
Adds holiday features for given region(s) as new column(s). It uses the pattern "holiday_{country_code}" for naming the columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
country_codes |
list[str] | str
|
List of country codes (eg.: |
required |
labeling |
str | LabelingMethod
|
|
weekday_weekend_holiday
|
LabelingMethod
Bases: ParsableEnum
Parameters:
Name | Type | Description | Default |
---|---|---|---|
holiday_binary |
|
required | |
weekday_weekend_holiday |
|
required | |
weekday_weekend_uniqueholiday |
|
required | |
weekday_weekend_uniqueholiday_string |
|
required |
lags
AddLagsX
Bases: Transformation
, Tunable
Adds past values of X
for the desired column(s).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns_and_lags |
(list[ColumnAndLag], ColumnAndLag)
|
A tuple (or a list of tuples) of the column name and a single or a list of lags to add as features. |
required |
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import AddLagsX
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = AddLagsX([("sine", 1), ("sine", [2,3])])
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds.head()
sine sine~lag_1 sine~lag_2 sine~lag_3
2021-12-31 15:40:00 -0.0000 -0.0126 -0.0251 -0.0377
2021-12-31 15:41:00 0.0126 -0.0000 -0.0126 -0.0251
2021-12-31 15:42:00 0.0251 0.0126 -0.0000 -0.0126
2021-12-31 15:43:00 0.0377 0.0251 0.0126 -0.0000
2021-12-31 15:44:00 0.0502 0.0377 0.0251 0.0126
math
MultiplyBy
TakeLog
Bases: InvertibleTransformation
, Tunable
Takes the logarithm of the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
base |
(int, str)
|
The base of the logarithm, by default "e". Valid values are "e", np.e, "10", 10, "2", 2. |
'e'
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import TakeLog
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data(freq="min")
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = TakeLog()
>>> X["sine"].head()
2021-12-31 07:20:00 0.0000
2021-12-31 07:21:00 0.0126
2021-12-31 07:22:00 0.0251
2021-12-31 07:23:00 0.0377
2021-12-31 07:24:00 0.0502
Freq: T, Name: sine, dtype: float64
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds["sine"].head()
2021-12-31 15:40:00 -inf
2021-12-31 15:41:00 -4.374058
2021-12-31 15:42:00 -3.684887
2021-12-31 15:43:00 -3.278095
2021-12-31 15:44:00 -2.991740
Freq: T, Name: sine, dtype: float64
TurnPositive
Bases: InvertibleTransformation
Adds a constant to the data, varying by column, so that all values are positive. It identifies the constant during training, and applies it during inference (and backtesting). Therefore there's no guarantee that the data will be positive during inference (and backtesting).
It can not be updated after the initial training, as that'd change the underlying distribution of the data.
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import TurnPositive
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data(freq="min")
>>> X, y = X - 1, y - 1
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = TurnPositive()
>>> X["sine"].head()
2021-12-31 07:20:00 -1.0000
2021-12-31 07:21:00 -0.9874
2021-12-31 07:22:00 -0.9749
2021-12-31 07:23:00 -0.9623
2021-12-31 07:24:00 -0.9498
Freq: T, Name: sine, dtype: float64
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds["sine"].head()
2021-12-31 15:40:00 2.0000
2021-12-31 15:41:00 2.0126
2021-12-31 15:42:00 2.0251
2021-12-31 15:43:00 2.0377
2021-12-31 15:44:00 2.0502
Freq: T, Name: sine, dtype: float64
scaling
MinMaxScaler
Bases: WrapInvertibleSKLearnTransformation
Transform features by scaling each feature to a given range.
A wrapper around SKLearn's StandardScaler. Capable of further updates after the initial fit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feature_range |
tuple(min, max)
|
Desired range of transformed data. |
(0, 1)
|
clip |
bool
|
Set to True to clip transformed values of held-out data to
provided |
False
|
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import MinMaxScaler
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = MinMaxScaler()
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> X["sine"].loc[preds.index].head()
2021-12-31 15:40:00 -0.0000
2021-12-31 15:41:00 0.0126
2021-12-31 15:42:00 0.0251
2021-12-31 15:43:00 0.0377
2021-12-31 15:44:00 0.0502
Freq: T, Name: sine, dtype: float64
>>> preds["sine"].head()
2021-12-31 15:40:00 0.50000
2021-12-31 15:41:00 0.50630
2021-12-31 15:42:00 0.51255
2021-12-31 15:43:00 0.51885
2021-12-31 15:44:00 0.52510
Freq: T, Name: sine, dtype: float64
References
StandardScaler
Bases: WrapInvertibleSKLearnTransformation
Standardize features by removing the mean and scaling to unit variance.
A wrapper around SKLearn's StandardScaler. Capable of further updates after the initial fit.
Examples:
>>> from fold.loop import train_backtest
>>> from fold.splitters import SlidingWindowSplitter
>>> from fold.transformations import StandardScaler
>>> from fold.utils.tests import generate_sine_wave_data
>>> X, y = generate_sine_wave_data()
>>> splitter = SlidingWindowSplitter(train_window=0.5, step=0.2)
>>> pipeline = StandardScaler()
>>> X["sine"].head()
2021-12-31 07:20:00 0.0000
2021-12-31 07:21:00 0.0126
2021-12-31 07:22:00 0.0251
2021-12-31 07:23:00 0.0377
2021-12-31 07:24:00 0.0502
Freq: T, Name: sine, dtype: float64
>>> preds, trained_pipeline, _, _ = train_backtest(pipeline, X, y, splitter)
>>> preds["sine"].head()
2021-12-31 15:40:00 -0.000000
2021-12-31 15:41:00 0.017819
2021-12-31 15:42:00 0.035497
2021-12-31 15:43:00 0.053316
2021-12-31 15:44:00 0.070994
Freq: T, Name: sine, dtype: float64
References
sklearn
WrapSKLearnFeatureSelector
Bases: FeatureSelector
, Tunable
Wraps an SKLearn Feature Selector class, stores the selected columns in selected_features
property.
There's no need to use it directly, fold
automatically wraps all sklearn feature selectors into this class.
WrapSKLearnTransformation
Bases: Transformation
, Tunable
Wraps an SKLearn Transformation.
There's no need to use it directly, fold
automatically wraps all sklearn transformations into this class.