Overall process - MetricsReloaded.processes.overall_process¶
This module provides class to perform the overall evaluation process.
- class MetricsReloaded.processes.overall_process.ProcessEvaluation(data, category, measures_pcc=[], measures_mcc=[], measures_boundary=[], measures_overlap=[], measures_mt=[], measures_detseg=[], measures_cal=[], localization='mask_iou', assignment='greedy_matching', pixdim=[], flag_map=False, file=[], thresh_ass=0.5, case=True, flag_fp_in=True, ignore_missing=False)[source]¶
Performs the evaluation of the data stored in a pickled file according to all the measures, categories and choices of processing
- Parameters:
data – dictionary containing all the data to be used for the comparison; possible keys include “pred_loc”, “ref_loc”, “pred_prob”,
category – task to be considered choice among ImLC, ObD, SemS, InS
measures_pcc – list of per class counting measures (these need to be adequate for the chosen task category)
measures_mcc – list of multi class counting measures
measures_boundary – list of measures to assess boundary quality
measures_overlap – list of measures to assess overlap quality
measures_mt – list of multi-threshold measures
measures_detseg – list of measures assessing jointly detection and segmentation performance
measures_cal – list of calibration measures (only available for image level classification class)
localization – choice for localization strategy (used in Instance segmentation and Object detection tasks)
assignment – choice for the assignment strategy (used in Instance segmentation and Object detection tasks)
pixdim – pixel dimensions as list
flag_map – indication whether nifti images indicating true positive elements for the reference, the prediction and errors should be created (done only for instance segmentation)
file – name of files
thresh_ass – threshold chosen for the assignment (default 0.5)
case – indication of the handling of cases separately (True) or jointly (False)
flag_fp_in – indicates that false positive should be accounted for
ignore_missing – indicates whether the missing predictions should be considered in the overall assessment (True) or not (False)
- check_valid_measures_cat()[source]¶
Function checking whether the category and the combination of measures suggested are suitable for continuing the process
- Returns:
flag_valid
- process_data()[source]¶
Performs the processing of the data according to the details given in the setting up of the process Contributes to the attribution of one dataframe per type of measures :
resdet - detection results
resseg - segmentation results
resmt - multi-threshold results
resmcc - multi class counting results
rescal - calibration results
All these dataframes are initialised as None and replaced according to the chosen task. The tasks should yield the following outputs:
ImLC:
resdet
rescal
resmt
resmcc
SemS:
resseg
ObD:
resdet
resmt
resmcc
InS:
resdet
resseg
resmt
resmcc
The different categories of task considered are:
ImLC - Image Level Classification
SemS - Semantic Segmentation
ObD - Object detection
InS - Instance segmentation
For each of these tasks only certain metrics are available and suitable. Error messages will be given and the processing interrupted if the chosen task and the chosen evaluation measures are not compatible. Evaluation measures are classified into the following categories:
Per class counting measures - measures_pcc
Multi class counting measures - measures_mcc
Overlap measures - measures_overlap
Boundary measures - measures_boundary
Multi threshold measures - measures_mt
Calibration measures - measures_cal
Combined detection and segmentation metrics - measures_detseg
The available measures per task are:
ImLC:
multi threshold measures:
per class counting measures:
fbeta - FBeta score
lr+ - positive likelihood ratio
accuracy
ba - balance accuracy
ec - expected cost
nb - net benefit
numb_ref - number in reference
numb_pred - number in prediction
numb_tp - number of true positives
numb_fp - number of false positives
numb_fn - number of false negatives
cohens_kappa
multi class counting measures:
mcc - matthews correlation coefficient
wck - weighted cohen’s kappa
ec - expected cost
calibration measures:
ls - logarithmic score
bs - Brier Score
cwece - Class-wise expectation calibration error
nll - Negative log-likelihood
rbs - Root Brier Score
ece_kde - Expectation Calibration Error with Kernel density estimation
kce - Kernel Calibration error
ece - Expectation Calibration Error
Object Detection - ObD:
per class counting measures:
fbeta - FBeta score
numb_pred - number of predicted elements
numb_tp - number of true positives
numb_fp - number of false positives
numb_fn - number of false negatives
numb_ref - number of reference elements
sensitivity - sensitivity
multi-threshold measures:
sens@spec - sensitivity at specificity
spec@sens - specificity at sensitivity
sens@ppv - sensitivity at positive predictive value
ppv@sens - positive predictive value at sensitivity
sens@fppi - sensitivity at false positive per image
fppi@sens - false positive per image at sensitivity
ap - average precision
froc - free receiver operator curve
Semantic segmentation - SemS:
per class measures of overlap:
dsc - dice similarity coefficient
fbeta - FBeta score
cldice - centreline dice
iou - intersection over union
measures of boundary quality:
assd - average symmetric surface distance
masd - mean average surface distance
hd - hausdorff distance
hd_perc - percentile of hausdorff distance
nsd - normalised surface dice
boundary_iou - boundary intersection over union
per class counting :
numb_ref - number of reference elements
numb_pred - number of predicted elements
numb_tp - number of true positives
numb_fp - number of false positives
numb_fn - number of false negatives
Instance segmentation - InS:
combined measures of detection and segmentation
pq - panoptic quality
per class counting measures:
fbeta - FBeta score
numb_ref - number of reference instances
numb_pred - number of prediction instances
numb_tp - number of true positives
numb_fp - number of false positives
numb_fn - number of false negatives
multi-threshold measures:
sens@spec - sensitivity at specificity
spec@sens - specificity at sensitivity
sens@ppv - sensitivity at positive predictive value
ppv@sens - positive predictive value at sensitivity
fppi@sens - false positive per image at sensitivity
sens@fppi - sensitivity at false positive per image
ap - average precision
froc - free receiver operator curve
measures of overlap:
dsc - dice similarity coefficient
fbeta - fbeta score
cldice - centreline dice similarity coefficient
iou - intersection over union
measures of boundary quality:
hd - hausdorff distance
boundary_iou - boundary intersection over union
masd - mean average surface distance
assd - average symmetric surface distance
nsd - normalised surface dice
hd_perc - percentile of hausdorff distance