Scorecard & Metric
krisi.evaluate.scorecard
ScoreCard
ScoreCard Object.
Krisi's main object holding and evaluating metrics. Stores, evaluates and generates vizualisations of predefined and custom metrics for regression and classification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y |
Targets
|
True Targets to which the metrics are evaluated to. |
required |
predictions |
Predictions
|
The single point predictions to which the metrics are evaluated to. |
required |
model_name |
Optional[str]
|
The name of the model that the predictions were generated by. Used for identifying scorecards. |
None
|
model_description |
str
|
A description of the model that the predictions were generated by. Used for reporting. |
''
|
dataset_name |
Optional[str]
|
The name of the dataset from which the |
None
|
dataset_description |
str
|
A description of the dataset from which the |
''
|
project_name |
Optional[str]
|
The name of the project. Used for reporting and saving to a directory (eg.: multiple scorecards) |
None
|
project_description |
str
|
A description of the project. Used for reporting. |
''
|
dataset_type |
Optional[Union[DatasetType, str]]
|
Whether the task was a binar/multi-label classifiction of regression. If set to |
None
|
sample_type |
Union[str, SampleTypes]
|
Whether we should evaluate it on insample or out of sample.
|
outofsample
|
default_metrics |
Optional[Union[List[Metric], Metric]]
|
Default metrics that get evaluated. See |
None
|
custom_metrics |
Optional[Union[List[Metric], Metric]]
|
Custom metrics that get evaluated. If specified it will evaluate these after |
None
|
rolling_args |
Dict[str, Any]
|
Arguments to be passed onto
|
None
|
Examples:
>>> from krisi import ScoreCard
... y_pred, y_true = [0, 2, 1, 3], [0, 1, 2, 3]
... sc = ScoreCard()
... sc.evaluate(y_pred, y_true, defaults=True) # Calculate predefined metrics
... sc["own_metric"] = (y_pred - y_true).mean() # Add a metric result directly
... sc.print()
__setattr__
Defines Dictionary like behaviour and ensures that a Metric can be added as a:
- `Metric` object,
- Dictionary,
- Direct result (float, int or a List of float or int). Gets wrapped in a `Metric` object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
string
|
The key to which the object will be assigned to. |
required |
item |
Dictionary, Metric, Float, Int or List of Float or Int, or pd.Series
|
The result that gets stored with the key. Depending on the type of object it will result in different behaviours:
|
required |
Examples:
>>> from krisi import ScoreCard
... sc = ScoreCard()
... sc['metric_result'] = 0.53 # Direct result assignment as a Dictionary
Metric(result=0.53, key='metric_result', category=None, parameters=None, info="", ...)
>>> sc.another_metric_result = 1 # Direct object assignment
Metric(result=1, key='another_metric_result', category=None, parameters=None, info="", ...)
>>> from krisi.evaluate.metric import Metric
... from krisi.evaluate.type import MetricCategories
... sc.full_metric = Metric("My own metric", category=MetricCategories.class_err, info="A fictious metric with metadata", func: lambda y, y_hat: (y - y_hat)/2)
Metric("My own metric", key="my_own_metric", category=MetricCategories.class_err, info="A fictious metric with metadata", ...)
evaluate
Evaluates Metric
s present on the ScoreCard
Parameters:
Name | Type | Description | Default |
---|---|---|---|
defaults |
bool
|
Wether the default |
True
|
Returns:
Type | Description |
---|---|
None
|
|
evaluate_benchmark
evaluate_benchmark(benchmark_models: List[Model], num_benchmark_iter: int, defaults: bool = True) -> None
Evaluates Metric
s to a benchmark on the ScoreCard
Parameters:
Name | Type | Description | Default |
---|---|---|---|
defaults |
bool
|
Wether the default |
True
|
Returns:
Type | Description |
---|---|
None
|
|
evaluate_over_time
Evaluates Metric
s present on the ScoreCard
over time, either with expanding
or fixed sized window. Assigns list of results to results_over_time
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y |
The true labels to compare values to. |
required | |
predictions |
The predicted values. Integers or whole floats if classification, else floats. |
required | |
defaults |
bool
|
Wether the default |
True
|
window |
Size of window. If number is provided then evaluation happens on a fixed window size, otherwise it evaluates it on an expanding window basis. |
required |
Returns:
Type | Description |
---|---|
None
|
|
get_all_metrics
get_all_metrics(defaults: bool = True, only_evaluated: bool = False, spread_comparisons: bool = False) -> List[Metric]
Helper function that returns both default_metrics
and custom_metrics
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
only_evaluated |
bool
|
Only return |
False
|
defauls |
Whether |
required |
Returns:
Type | Description |
---|---|
List of Metrics
|
|
get_custom_metrics
Returns a List of Custom Metric
s defined by the user on initalization
of the ScoreCard
Returns:
Type | Description |
---|---|
List of Metrics
|
|
get_default_metrics
Returns a List of Predefined Metrics according to task type: regression, classification, multi-label classification.
Returns:
Type | Description |
---|---|
List of Metrics
|
|
get_ds
Returns a pd.Series
where each index is the name of a Metric
and
the value is the corresponding result
Returns:
Type | Description |
---|---|
Series
|
|
print(mode: Union[str, PrintMode, List[PrintMode], List[str]] = PrintMode.extended, with_info: bool = False, with_parameters: bool = True, with_diagnostics: bool = False, input_analysis: bool = True, title: Optional[str] = None, frame_or_series: bool = True) -> None
Prints the ScoreCard to the console.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mode |
Union[str, PrintMode, List[PrintMode], List[str]]
|
|
extended
|
with_info |
bool
|
Wether descriptions of each metrics should be printed or not |
False
|
input_analysis |
bool
|
Wether it should print analysis about the raw |
True
|
title |
Optional[str]
|
Title of the table when mode = |
None
|
krisi.evaluate.type
krisi.evaluate.metric
Metric
dataclass
Bases: Generic[MetricResult]
Class representing a metric.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the metric. |
required |
key |
str
|
The key used to reference the metric. |
''
|
category |
Optional[MetricCategories]
|
The category of the metric. |
None
|
result |
Optional[Union[Exception, MetricResult, List[MetricResult]]]
|
The result of the evaluated |
None
|
result_rolling |
Optional[Union[Exception, MetricResult, List[MetricResult]]]
|
The result of the evaluated |
None
|
parameters |
dict
|
The paramaters that are passed into the evaluation function (param: |
field(default_factory=dict)
|
func |
Optional[MetricFunction]
|
The function used to compute the metric. |
None
|
plot_funcs |
Optional[Union[List[PlotDefinition], PlotDefinition]]
|
List of functions used to plot the metric. |
None
|
plot_funcs_rolling |
Optional[Union[List[PlotDefinition], PlotDefinition]]
|
Function used to plot the rolling metric value. |
None
|
info |
str
|
Additional information about the metric. |
''
|
restrict_to_sample |
Optional[SampleTypes]
|
Weather the |
None
|
comp_complexity |
Optional[ComputationalComplexity]
|
How resource intensive the calculation is, by default None |
None
|
accepts_probabilities |
bool
|
Whether the metric accepts probabilities as input, by default False |
False
|
supports_multiclass |
bool
|
Whether the metric supports multiclass classification, by default False |
False
|