Overall process - MetricsReloaded.processes.overall_process

This module provides class to perform the overall evaluation process.

class MetricsReloaded.processes.overall_process.ProcessEvaluation(data, category, measures_pcc=[], measures_mcc=[], measures_boundary=[], measures_overlap=[], measures_mt=[], measures_detseg=[], measures_cal=[], localization='mask_iou', assignment='greedy_matching', pixdim=[], flag_map=False, file=[], thresh_ass=0.5, case=True, flag_fp_in=True, ignore_missing=False)[source]

Performs the evaluation of the data stored in a pickled file according to all the measures, categories and choices of processing

Parameters:
  • data – dictionary containing all the data to be used for the comparison; possible keys include “pred_loc”, “ref_loc”, “pred_prob”,

  • category – task to be considered choice among ImLC, ObD, SemS, InS

  • measures_pcc – list of per class counting measures (these need to be adequate for the chosen task category)

  • measures_mcc – list of multi class counting measures

  • measures_boundary – list of measures to assess boundary quality

  • measures_overlap – list of measures to assess overlap quality

  • measures_mt – list of multi-threshold measures

  • measures_detseg – list of measures assessing jointly detection and segmentation performance

  • measures_cal – list of calibration measures (only available for image level classification class)

  • localization – choice for localization strategy (used in Instance segmentation and Object detection tasks)

  • assignment – choice for the assignment strategy (used in Instance segmentation and Object detection tasks)

  • pixdim – pixel dimensions as list

  • flag_map – indication whether nifti images indicating true positive elements for the reference, the prediction and errors should be created (done only for instance segmentation)

  • file – name of files

  • thresh_ass – threshold chosen for the assignment (default 0.5)

  • case – indication of the handling of cases separately (True) or jointly (False)

  • flag_fp_in – indicates that false positive should be accounted for

  • ignore_missing – indicates whether the missing predictions should be considered in the overall assessment (True) or not (False)

check_valid_measures_cat()[source]

Function checking whether the category and the combination of measures suggested are suitable for continuing the process

Returns:

flag_valid

process_data()[source]

Performs the processing of the data according to the details given in the setting up of the process Contributes to the attribution of one dataframe per type of measures :

  • resdet - detection results

  • resseg - segmentation results

  • resmt - multi-threshold results

  • resmcc - multi class counting results

  • rescal - calibration results

All these dataframes are initialised as None and replaced according to the chosen task. The tasks should yield the following outputs:

  • ImLC:

    • resdet

    • rescal

    • resmt

    • resmcc

  • SemS:

    • resseg

  • ObD:

    • resdet

    • resmt

    • resmcc

  • InS:

    • resdet

    • resseg

    • resmt

    • resmcc

The different categories of task considered are:

  • ImLC - Image Level Classification

  • SemS - Semantic Segmentation

  • ObD - Object detection

  • InS - Instance segmentation

For each of these tasks only certain metrics are available and suitable. Error messages will be given and the processing interrupted if the chosen task and the chosen evaluation measures are not compatible. Evaluation measures are classified into the following categories:

  • Per class counting measures - measures_pcc

  • Multi class counting measures - measures_mcc

  • Overlap measures - measures_overlap

  • Boundary measures - measures_boundary

  • Multi threshold measures - measures_mt

  • Calibration measures - measures_cal

  • Combined detection and segmentation metrics - measures_detseg

The available measures per task are:

  • ImLC:

    • multi threshold measures:

      • auroc - Area under the Receiver Operator Curve

      • ap - Average Precision

      • sens@spec - Sensitivity at Specificity

      • spec@sens - Specificity at Sensitivity

      • ppv@sens - Positive Predictive value at sensitivity

    • per class counting measures:

      • fbeta - FBeta score

      • lr+ - positive likelihood ratio

      • accuracy

      • ba - balance accuracy

      • ec - expected cost

      • nb - net benefit

      • numb_ref - number in reference

      • numb_pred - number in prediction

      • numb_tp - number of true positives

      • numb_fp - number of false positives

      • numb_fn - number of false negatives

      • cohens_kappa

    • multi class counting measures:

      • mcc - matthews correlation coefficient

      • wck - weighted cohen’s kappa

      • ec - expected cost

    • calibration measures:

      • ls - logarithmic score

      • bs - Brier Score

      • cwece - Class-wise expectation calibration error

      • nll - Negative log-likelihood

      • rbs - Root Brier Score

      • ece_kde - Expectation Calibration Error with Kernel density estimation

      • kce - Kernel Calibration error

      • ece - Expectation Calibration Error

  • Object Detection - ObD:

    • per class counting measures:

      • fbeta - FBeta score

      • numb_pred - number of predicted elements

      • numb_tp - number of true positives

      • numb_fp - number of false positives

      • numb_fn - number of false negatives

      • numb_ref - number of reference elements

      • sensitivity - sensitivity

    • multi-threshold measures:

      • sens@spec - sensitivity at specificity

      • spec@sens - specificity at sensitivity

      • sens@ppv - sensitivity at positive predictive value

      • ppv@sens - positive predictive value at sensitivity

      • sens@fppi - sensitivity at false positive per image

      • fppi@sens - false positive per image at sensitivity

      • ap - average precision

      • froc - free receiver operator curve

  • Semantic segmentation - SemS:

    • per class measures of overlap:

      • dsc - dice similarity coefficient

      • fbeta - FBeta score

      • cldice - centreline dice

      • iou - intersection over union

    • measures of boundary quality:

      • assd - average symmetric surface distance

      • masd - mean average surface distance

      • hd - hausdorff distance

      • hd_perc - percentile of hausdorff distance

      • nsd - normalised surface dice

      • boundary_iou - boundary intersection over union

    • per class counting :

      • numb_ref - number of reference elements

      • numb_pred - number of predicted elements

      • numb_tp - number of true positives

      • numb_fp - number of false positives

      • numb_fn - number of false negatives

  • Instance segmentation - InS:

    • combined measures of detection and segmentation

      • pq - panoptic quality

    • per class counting measures:

      • fbeta - FBeta score

      • numb_ref - number of reference instances

      • numb_pred - number of prediction instances

      • numb_tp - number of true positives

      • numb_fp - number of false positives

      • numb_fn - number of false negatives

    • multi-threshold measures:

      • sens@spec - sensitivity at specificity

      • spec@sens - specificity at sensitivity

      • sens@ppv - sensitivity at positive predictive value

      • ppv@sens - positive predictive value at sensitivity

      • fppi@sens - false positive per image at sensitivity

      • sens@fppi - sensitivity at false positive per image

      • ap - average precision

      • froc - free receiver operator curve

    • measures of overlap:

      • dsc - dice similarity coefficient

      • fbeta - fbeta score

      • cldice - centreline dice similarity coefficient

      • iou - intersection over union

    • measures of boundary quality:

      • hd - hausdorff distance

      • boundary_iou - boundary intersection over union

      • masd - mean average surface distance

      • assd - average symmetric surface distance

      • nsd - normalised surface dice

      • hd_perc - percentile of hausdorff distance