Skip to content

Evaluate

krisi.evaluate.compare

compare

compare(scorecards: List[ScoreCard], metric_keys: Optional[List[str]] = None, sort_by: Optional[str] = None, dataframe: bool = True) -> Union[pd.DataFrame, str]

Creates a table where each column is a metric and each row is a scorecard and its corresponding results.

Parameters:

Name Type Description Default
scorecards List[ScoreCard]

ScoreCards to compare.

required
metric_keys Optional[List[str]]

List of metrics to display. If not set it will return all evaluated metrics on the first scorecard. Sorts the results by the first element of this list if sort_by is not specified., by default None

None
sort_by Optional[str]

Metric to sort results by. Selected Metric will be displayed in the first row. If not specified metrics will be sorted by the first element of metric_keys. If metric_keys is not specified it will default to the first metric found on the first ScoreCard, by default None

None
dataframe bool

Whether it should return a pd.DataFrame or a str, by default True

True

Returns:

Type Description
Union[DataFrame, str]

A comparison table, either in pd.DataFrame or string format.

krisi.evaluate.score

score

score(y: Targets, predictions: Predictions, probabilities: Optional[Probabilities] = None, sample_weight: Optional[Weights] = None, model_name: Optional[str] = None, dataset_name: Optional[str] = None, project_name: Optional[str] = None, default_metrics: Optional[Union[List[Metric], Metric]] = None, custom_metrics: Optional[Union[List[Metric], Metric]] = None, dataset_type: Optional[Union[DatasetType, str]] = None, sample_type: Union[str, SampleTypes] = SampleTypes.outofsample, calculation: Union[Calculation, str] = Calculation.single, rolling_args: Optional[Dict[str, Any]] = None, raise_exceptions: bool = False, benchmark_models: Optional[Union[Model, List[Model]]] = None, num_benchmark_iter: int = 100, **kwargs) -> ScoreCard

Creates a ScoreCard based on the passed in arguments, evaluates and then returns the ScoreCard.

Parameters:

Name Type Description Default
y Targets

True Targets to which the metrics are evaluated to.

required
predictions Predictions

The single point predictions to which the metrics are evaluated to.

required
model_name Optional[str]

The name of the model that the predictions were generated by. Used for identifying scorecards.

None
dataset_name Optional[str]

The name of the dataset from which the y (targets) orginate from. Used for reporting.

None
project_name Optional[str]

The name of the project. Used for reporting and saving to a directory (eg.: multiple scorecards)

None
default_metrics Optional[Union[List[Metric], Metric]]

Default metrics that get evaluated. See library.

None
custom_metrics Optional[Union[List[Metric], Metric]]

Custom metrics that get evaluated. If specified it will evaluate these after default_metric See library.

None
dataset_type Optional[Union[DatasetType, str]]

Whether the task was a binar/multi-label classifiction of regression. If set to None it will infer from the target.

None
sample_type Union[str, SampleTypes]

Whether we should evaluate it on insample or out of sample.

- `SampleTypes.outofsample`
- `SampleTypes.insample`
outofsample
calculation Union[Calculation, str]

Whether it should evaluate Metrics on a rolling basis or on the whole prediction or both, by default Calculation.single

- `Calculation.single`
- `Calculation.rolling`
- `Calculation.both`
single
rolling_args Dict[str, Any]

Arguments to be passed onto pd.DataFrame.rolling. Default:

  • The window size of the rolling metric evaluation. If None evaluation over time will be on expanding window basis, by default len(dataset)//100.
  • The step size of the rolling metric evaluation, by default len(dataset)//100.
None

Returns:

Type Description
ScoreCard

The ScoreCard Evaluated

Raises:

Type Description
ValueError

If Calculation type is incorrectly specified.