Skip to content

Scorecard & Metric

krisi.evaluate.scorecard

ScoreCard

ScoreCard Object.

Krisi's main object holding and evaluating metrics. Stores, evaluates and generates vizualisations of predefined and custom metrics for regression and classification.

Parameters:

Name Type Description Default
y Targets

True Targets to which the metrics are evaluated to.

required
predictions Predictions

The single point predictions to which the metrics are evaluated to.

required
model_name Optional[str]

The name of the model that the predictions were generated by. Used for identifying scorecards.

None
model_description str

A description of the model that the predictions were generated by. Used for reporting.

''
dataset_name Optional[str]

The name of the dataset from which the y (targets) orginate from. Used for reporting.

None
dataset_description str

A description of the dataset from which the y (targets) orginate from. Used for reporting.

''
project_name Optional[str]

The name of the project. Used for reporting and saving to a directory (eg.: multiple scorecards)

None
project_description str

A description of the project. Used for reporting.

''
dataset_type Optional[Union[DatasetType, str]]

Whether the task was a binar/multi-label classifiction of regression. If set to None it will infer from the target.

None
sample_type Union[str, SampleTypes]

Whether we should evaluate it on insample or out of sample.

- `SampleTypes.outofsample`
- `SampleTypes.insample`
outofsample
default_metrics Optional[Union[List[Metric], Metric]]

Default metrics that get evaluated. See library.

None
custom_metrics Optional[Union[List[Metric], Metric]]

Custom metrics that get evaluated. If specified it will evaluate these after default_metric See library.

None
rolling_args Dict[str, Any]

Arguments to be passed onto pd.DataFrame.rolling. Default:

  • The window size of the rolling metric evaluation. If None evaluation over time will be on expanding window basis, by default None.
  • The step size of the rolling metric evaluation, by default 1.
None

Examples:

>>> from krisi import ScoreCard
... y_pred, y_true = [0, 2, 1, 3], [0, 1, 2, 3]
... sc = ScoreCard()
... sc.evaluate(y_pred, y_true, defaults=True) # Calculate predefined metrics
... sc["own_metric"] = (y_pred - y_true).mean() # Add a metric result directly
... sc.print()

__setattr__

__setattr__(key: str, item: Any) -> None

Defines Dictionary like behaviour and ensures that a Metric can be added as a:

- `Metric` object,
- Dictionary,
- Direct result (float, int or a List of float or int). Gets wrapped in a `Metric` object

Parameters:

Name Type Description Default
key string

The key to which the object will be assigned to.

required
item Dictionary, Metric, Float, Int or List of Float or Int, or pd.Series

The result that gets stored with the key. Depending on the type of object it will result in different behaviours:

- If `Metric` or `Dict` it will store the object on `ScoreCard[key]`
- If (`Float`, `Int`, `List of Float`, `List of Int`, `pd.Series`) it will check if a `Metric` on `key`
already exists on `ScoreCard`. If yes, it will assign item to the result field of the
existing `Metric`. If not it will wrap the item in a new `Metric` first then assign it to
`ScoreCard[key]`.
required

Examples:

>>> from krisi import ScoreCard
... sc = ScoreCard()
... sc['metric_result'] = 0.53 # Direct result assignment as a Dictionary
Metric(result=0.53, key='metric_result', category=None, parameters=None, info="", ...)
>>> sc.another_metric_result = 1 # Direct object assignment
Metric(result=1, key='another_metric_result', category=None, parameters=None, info="", ...)
>>> from krisi.evaluate.metric import Metric
... from krisi.evaluate.type import MetricCategories
... sc.full_metric = Metric("My own metric", category=MetricCategories.class_err, info="A fictious metric with metadata", func: lambda y, y_hat: (y - y_hat)/2)
Metric("My own metric", key="my_own_metric", category=MetricCategories.class_err, info="A fictious metric with metadata", ...)
>>> sc.metric_as_dictionary = {name: "My other metric", info: "A Metric created with a dictionary", func: lambda y, y_hat: y - y_hat}
Metric("My other metric", key="my_other_metric", info="A Metric created with a dictionary", ...)

evaluate

evaluate(defaults: bool = True) -> None

Evaluates Metrics present on the ScoreCard

Parameters:

Name Type Description Default
defaults bool

Wether the default Metrics should be evaluated or not.

True

Returns:

Type Description
None

evaluate_benchmark

evaluate_benchmark(benchmark_models: List[Model], num_benchmark_iter: int, defaults: bool = True) -> None

Evaluates Metrics to a benchmark on the ScoreCard

Parameters:

Name Type Description Default
defaults bool

Wether the default Metrics should be evaluated or not.

True

Returns:

Type Description
None

evaluate_over_time

evaluate_over_time(defaults: bool = True) -> None

Evaluates Metrics present on the ScoreCard over time, either with expanding or fixed sized window. Assigns list of results to results_over_time.

Parameters:

Name Type Description Default
y

The true labels to compare values to.

required
predictions

The predicted values. Integers or whole floats if classification, else floats.

required
defaults bool

Wether the default Metrics should be evaluated or not. Default value = True

True
window

Size of window. If number is provided then evaluation happens on a fixed window size, otherwise it evaluates it on an expanding window basis.

required

Returns:

Type Description
None

get_all_metrics

get_all_metrics(defaults: bool = True, only_evaluated: bool = False, spread_comparisons: bool = False) -> List[Metric]

Helper function that returns both default_metrics and custom_metrics.

Parameters:

Name Type Description Default
only_evaluated bool

Only return Metrics that were already evaluated

False
defauls

Whether default_metrics should be return or not.

required

Returns:

Type Description
List of Metrics

get_custom_metrics

get_custom_metrics() -> List[Metric]

Returns a List of Custom Metrics defined by the user on initalization of the ScoreCard

Returns:

Type Description
List of Metrics

get_default_metrics

get_default_metrics() -> List[Metric]

Returns a List of Predefined Metrics according to task type: regression, classification, multi-label classification.

Returns:

Type Description
List of Metrics

get_ds

get_ds(name_as_index: bool = False) -> pd.Series

Returns a pd.Series where each index is the name of a Metric and the value is the corresponding result

Returns:

Type Description
Series

print

print(mode: Union[str, PrintMode, List[PrintMode], List[str]] = PrintMode.extended, with_info: bool = False, with_parameters: bool = True, with_diagnostics: bool = False, input_analysis: bool = True, title: Optional[str] = None, frame_or_series: bool = True) -> None

Prints the ScoreCard to the console.

Parameters:

Name Type Description Default
mode Union[str, PrintMode, List[PrintMode], List[str]]
  • PrintMode.extended or 'extended' prints the full ScoreCard, with targets, predictions, residuals, etc.
  • PrintMode.minimal or 'minimal' prints the name and the matching result of each metric in the ScoreCard, without fancy formatting
  • PrintMode.minimal_table or 'minimal_table' creates a table format of just the metric name and the accompanying result
extended
with_info bool

Wether descriptions of each metrics should be printed or not

False
input_analysis bool

Wether it should print analysis about the raw targets and predictions

True
title Optional[str]

Title of the table when mode = 'minimal_table'

None

krisi.evaluate.type

krisi.evaluate.metric

Metric dataclass

Bases: Generic[MetricResult]

Class representing a metric.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
key str

The key used to reference the metric.

''
category Optional[MetricCategories]

The category of the metric.

None
result Optional[Union[Exception, MetricResult, List[MetricResult]]]

The result of the evaluated Metric, by default None

None
result_rolling Optional[Union[Exception, MetricResult, List[MetricResult]]]

The result of the evaluated Metric over time, by default None

None
parameters dict

The paramaters that are passed into the evaluation function (param: func), by default field(default_factory=dict)

field(default_factory=dict)
func Optional[MetricFunction]

The function used to compute the metric.

None
plot_funcs Optional[Union[List[PlotDefinition], PlotDefinition]]

List of functions used to plot the metric.

None
plot_funcs_rolling Optional[Union[List[PlotDefinition], PlotDefinition]]

Function used to plot the rolling metric value.

None
info str

Additional information about the metric.

''
restrict_to_sample Optional[SampleTypes]

Weather the Metric should only be evaluated on insample or out of sample data, by default None

None
comp_complexity Optional[ComputationalComplexity]

How resource intensive the calculation is, by default None

None
accepts_probabilities bool

Whether the metric accepts probabilities as input, by default False

False
supports_multiclass bool

Whether the metric supports multiclass classification, by default False

False