Scorecard & Metric

krisi.evaluate.scorecard

ScoreCard

ScoreCard Object.

Krisi's main object holding and evaluating metrics. Stores, evaluates and generates vizualisations of predefined and custom metrics for regression and classification.

Parameters:

Name	Type	Description	Default
`y`	`Targets`	True Targets to which the metrics are evaluated to.	required
`predictions`	`Predictions`	The single point predictions to which the metrics are evaluated to.	required
`model_name`	`Optional[str]`	The name of the model that the predictions were generated by. Used for identifying scorecards.	`None`
`model_description`	`str`	A description of the model that the predictions were generated by. Used for reporting.	`''`
`dataset_name`	`Optional[str]`	The name of the dataset from which the `y` (targets) orginate from. Used for reporting.	`None`
`dataset_description`	`str`	A description of the dataset from which the `y` (targets) orginate from. Used for reporting.	`''`
`project_name`	`Optional[str]`	The name of the project. Used for reporting and saving to a directory (eg.: multiple scorecards)	`None`
`project_description`	`str`	A description of the project. Used for reporting.	`''`
`dataset_type`	`Optional[Union[DatasetType, str]]`	Whether the task was a binar/multi-label classifiction of regression. If set to `None` it will infer from the target.	`None`
`sample_type`	`Union[str, SampleTypes]`	Whether we should evaluate it on insample or out of sample. - `SampleTypes.outofsample` - `SampleTypes.insample`	`outofsample`
`default_metrics`	`Optional[Union[List[Metric], Metric]]`	Default metrics that get evaluated. See `library`.	`None`
`custom_metrics`	`Optional[Union[List[Metric], Metric]]`	Custom metrics that get evaluated. If specified it will evaluate these after `default_metric` See `library`.	`None`
`rolling_args`	`Dict[str, Any]`	Arguments to be passed onto `pd.DataFrame.rolling`. Default: The window size of the rolling metric evaluation. If `None` evaluation over time will be on expanding window basis, by default `None`. The step size of the rolling metric evaluation, by default `1`.	`None`

Examples:

>>> from krisi import ScoreCard
... y_pred, y_true = [0, 2, 1, 3], [0, 1, 2, 3]
... sc = ScoreCard()
... sc.evaluate(y_pred, y_true, defaults=True) # Calculate predefined metrics
... sc["own_metric"] = (y_pred - y_true).mean() # Add a metric result directly
... sc.print()

setattr

__setattr__(key: str, item: Any) -> None

Defines Dictionary like behaviour and ensures that a Metric can be added as a:

- `Metric` object,
- Dictionary,
- Direct result (float, int or a List of float or int). Gets wrapped in a `Metric` object

Parameters:

Name Type Description Default

key

string

The key to which the object will be assigned to.

required

item

Dictionary, Metric, Float, Int or List of Float or Int, or pd.Series

The result that gets stored with the key. Depending on the type of object it will result in different behaviours:

- If `Metric` or `Dict` it will store the object on `ScoreCard[key]`
- If (`Float`, `Int`, `List of Float`, `List of Int`, `pd.Series`) it will check if a `Metric` on `key`
already exists on `ScoreCard`. If yes, it will assign item to the result field of the
existing `Metric`. If not it will wrap the item in a new `Metric` first then assign it to
`ScoreCard[key]`.

required

Examples:

>>> from krisi import ScoreCard
... sc = ScoreCard()
... sc['metric_result'] = 0.53 # Direct result assignment as a Dictionary
Metric(result=0.53, key='metric_result', category=None, parameters=None, info="", ...)

>>> sc.another_metric_result = 1 # Direct object assignment
Metric(result=1, key='another_metric_result', category=None, parameters=None, info="", ...)

>>> from krisi.evaluate.metric import Metric
... from krisi.evaluate.type import MetricCategories
... sc.full_metric = Metric("My own metric", category=MetricCategories.class_err, info="A fictious metric with metadata", func: lambda y, y_hat: (y - y_hat)/2)
Metric("My own metric", key="my_own_metric", category=MetricCategories.class_err, info="A fictious metric with metadata", ...)

>>> sc.metric_as_dictionary = {name: "My other metric", info: "A Metric created with a dictionary", func: lambda y, y_hat: y - y_hat}
Metric("My other metric", key="my_other_metric", info="A Metric created with a dictionary", ...)

evaluate

evaluate(defaults: bool = True) -> None

Evaluates Metrics present on the ScoreCard

Parameters:

Name	Type	Description	Default
`defaults`	`bool`	Wether the default `Metric`s should be evaluated or not.	`True`

Returns:

Type	Description
`None`

evaluate_benchmark

evaluate_benchmark(benchmark_models: List[Model], num_benchmark_iter: int, defaults: bool = True) -> None

Evaluates Metrics to a benchmark on the ScoreCard

Parameters:

Name	Type	Description	Default
`defaults`	`bool`	Wether the default `Metric`s should be evaluated or not.	`True`

Returns:

Type	Description
`None`

evaluate_over_time

evaluate_over_time(defaults: bool = True) -> None

Evaluates Metrics present on the ScoreCard over time, either with expanding or fixed sized window. Assigns list of results to results_over_time.

Parameters:

Name	Type	Description	Default
`y`		The true labels to compare values to.	required
`predictions`		The predicted values. Integers or whole floats if classification, else floats.	required
`defaults`	`bool`	Wether the default `Metric`s should be evaluated or not. Default value = True	`True`
`window`		Size of window. If number is provided then evaluation happens on a fixed window size, otherwise it evaluates it on an expanding window basis.	required

Returns:

Type	Description
`None`

get_all_metrics

get_all_metrics(defaults: bool = True, only_evaluated: bool = False, spread_comparisons: bool = False) -> List[Metric]

Helper function that returns both default_metrics and custom_metrics.

Parameters:

Name	Type	Description	Default
`only_evaluated`	`bool`	Only return `Metric`s that were already evaluated	`False`
`defauls`		Whether `default_metrics` should be return or not.	required

Returns:

Type	Description
`List of Metrics`

get_custom_metrics

get_custom_metrics() -> List[Metric]

Returns a List of Custom Metrics defined by the user on initalization of the ScoreCard

Returns:

Type	Description
`List of Metrics`

get_default_metrics

get_default_metrics() -> List[Metric]

Returns a List of Predefined Metrics according to task type: regression, classification, multi-label classification.

Returns:

Type	Description
`List of Metrics`

get_ds

get_ds(name_as_index: bool = False) -> pd.Series

Returns a pd.Series where each index is the name of a Metric and the value is the corresponding result

Returns:

Type	Description
`Series`

print

print(mode: Union[str, PrintMode, List[PrintMode], List[str]] = PrintMode.extended, with_info: bool = False, with_parameters: bool = True, with_diagnostics: bool = False, input_analysis: bool = True, title: Optional[str] = None, frame_or_series: bool = True) -> None

Prints the ScoreCard to the console.

Parameters:

Name	Type	Description	Default
`mode`	`Union[str, PrintMode, List[PrintMode], List[str]]`	`PrintMode.extended` or `'extended'` prints the full ScoreCard, with targets, predictions, residuals, etc. `PrintMode.minimal` or `'minimal'` prints the name and the matching result of each metric in the ScoreCard, without fancy formatting `PrintMode.minimal_table` or `'minimal_table'` creates a table format of just the metric name and the accompanying result	`extended`
`with_info`	`bool`	Wether descriptions of each metrics should be printed or not	`False`
`input_analysis`	`bool`	Wether it should print analysis about the raw `targets` and `predictions`	`True`
`title`	`Optional[str]`	Title of the table when mode = `'minimal_table'`	`None`

krisi.evaluate.type

krisi.evaluate.metric

Metric `dataclass`

Bases: Generic[MetricResult]

Class representing a metric.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the metric.	required
`key`	`str`	The key used to reference the metric.	`''`
`category`	`Optional[MetricCategories]`	The category of the metric.	`None`
`result`	`Optional[Union[Exception, MetricResult, List[MetricResult]]]`	The result of the evaluated `Metric`, by default None	`None`
`result_rolling`	`Optional[Union[Exception, MetricResult, List[MetricResult]]]`	The result of the evaluated `Metric` over time, by default None	`None`
`parameters`	`dict`	The paramaters that are passed into the evaluation function (param: `func`), by default field(default_factory=dict)	`field(default_factory=dict)`
`func`	`Optional[MetricFunction]`	The function used to compute the metric.	`None`
`plot_funcs`	`Optional[Union[List[PlotDefinition], PlotDefinition]]`	List of functions used to plot the metric.	`None`
`plot_funcs_rolling`	`Optional[Union[List[PlotDefinition], PlotDefinition]]`	Function used to plot the rolling metric value.	`None`
`info`	`str`	Additional information about the metric.	`''`
`restrict_to_sample`	`Optional[SampleTypes]`	Weather the `Metric` should only be evaluated on insample or out of sample data, by default None	`None`
`comp_complexity`	`Optional[ComputationalComplexity]`	How resource intensive the calculation is, by default None	`None`
`accepts_probabilities`	`bool`	Whether the metric accepts probabilities as input, by default False	`False`
`supports_multiclass`	`bool`	Whether the metric supports multiclass classification, by default False	`False`

Scorecard & Metric

krisi.evaluate.scorecard

ScoreCard

__setattr__

evaluate

evaluate_benchmark

evaluate_over_time

get_all_metrics

get_custom_metrics

get_default_metrics

get_ds

print

krisi.evaluate.type

krisi.evaluate.metric

Metric dataclass

setattr

Metric `dataclass`