Model performance Indicators

Use the Model Performance Indicators tool to evaluate the performance of your predictive model. Available in Excel using the XLSTAT software.

How to measure the performance of a model?

When trying to predict the values of a quantitative Y variable, we talk about regression, whereas we talk about classification when the Y variable we need to predict is qualitative. XLSTAT offers several regression and classification learning models.

Say we have a variable of interest to predict, and the closer the prediction of the algorithm is to the target variable, the better the model will perform.

It is important to be able to evaluate the performance of a model to measure the risks but also to compare several algorithms and/or models.

The Performance Indicators tool was developed to help us answer this question : How much can I trust a model to predict future events?

Model performance Indicators in XLSTAT

XLSTAT provides a large number of indicators to evaluate the performance of a model:

Indicators for classification models

Notations : TP (True Positives), TN (True Negatives), FP (False Positives) et FN (False Negatives).

Accuracy : The accuracy is the ratio (TP+TN)/(TP+TN+FP+FN).
The closer it is to 1, better is the test.
Precision : Precision is the ratio TP/(TP + FP).
It corresponds to the proportion of positive predictions that are actually correct. In other words, a model with an accuracy of 0.8 correctly predicts the positive class in 80% of the cases.
Balanced accuracy (binary case only) : Balanced accuracy is an indicator used to evaluate the quality of a binary classifier. It's specially useful when the classes are unbalanced, i.e. one of the two classes appears more often than the other. It is calculated as follows: (Sensitivity + Specificity) / 2.
False Positive Rate (binary case only) : Proportion of negative cases that the test detects as positive (FPR = 1-Specificity).
False Negative Rate (binary case only) : Proportion of positive cases that the test detects as negative (FNR = 1-Sensitivity)
Correct classification : number of well-classified observations.
Misclassification : number of misclassified observations.
Prevalence: Relative frequency of the event of interest in the total sample (TP+FN)/N.
F-measure : The F-measure also called F-score or F1-score can be interpreted as a weighted average of precision and recall or sensitivity. Its value is between 0 and 1. It is defined by : F-measure = 2 * (Precision * Sensibility) / (Precision + Sensibility).
NER (null error rate) : it corresponds to the percentage of error that would be observed if the model always predicted the majority class.
Cohen Kappa : it is useful when we want to study the relationship between the response variable and the predictions. The value of Kappa is between 0 and 1. A value of 1 means that there is a total link between the two variables (perfect classification).
Cramer's V : The Cramer's V test compares the degree of linkage between the two variables studied. The closer V is to zero, the less dependent the variables studied are. On the other hand, it will be 1 when the two variables are completely dependent. In binary case (2x2 confusion matrix), it takes a value between -1 and 1. Thus, the closer V is to 1, the stronger the link between the two variables is.
MCC (Matthews correlation coefficient) : The Matthews correlation coefficient (MCC) or phi coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975. The MCC is defined identically to Pearson's phi coefficient.
Roc curve : The ROC curve (Receiver Operating Characteristics ) displays the performance of a model and enables a comparison to be made with other models. The terms used come from signal detection theory. The curve of points (1-specificity, sensitivity) is the ROC curve.
AUC : The area under the curve (AUC) is a synthetic index calculated for ROC curves. The AUC is the probability that a positive event is classified as positive by the test, given all possible values of the test. For an ideal model we have AUC = 1 (above in blue), where for a random pattern we have AUC = 0.5 (above in red). One usually considers that the model is good when the value of the AUC is higher than 0.7. A model that performs well should have an AUC between 0.87 and 0.9. A model with an AUC above 0.9 is excellent.
Lift curve : The Lift curve is the curve that represents the Lift value as a function of the percentage of the population. Lift is the ratio between the proportion of true positives and the proportion of positive predictions. A Lift of 1 means that there is no gain over an algorithm that makes random predictions. Usually, the higher the Lift, the better the model.
Cumulative gain curve : The gain curve represents the sensitivity, or recall, as a function of the percentage of the total population. It allows us to see which portion of the data concentrates the maximum number of positive events.

Indicators for regression models

Notations: W is the sum of the weights and p is the number of variables included in the model.

MSE: The mean squared error (MSE)
RMSE: The root mean square of the errors (RMSE) is the square root of the MSE.
MSLE (Mean Squared Log Error)

RMSLE (Root Mean Squared Log Error) : The root mean square of the log errors (RMSLE) is the square root of the MSLE.
MAPE (Mean Absolute Percentage Error) : MAPE also called MAPD for Mean Absolute Percentage Deviation.
Adjusted R²: The adjusted determination coefficient for the model.
Willmott index : used mainly in hydrological models, the redefined amenity index (Willmott et al., 2012).
Mielke and Berry index : The index is affected by the MAE and can be used for seasonal cases
Legates and McCabe's index : Used mostly in hydrological models, the Legates and McCabe index is recommended when there is seasonality or a difference in mean per period.
AIC : Akaike's Information Criterion. It is a model selection criterion which penalizes models for which adding new explanatory variables does not supply sufficient information to the model, the information being measured through the MSE. The aim is to minimize the AIC criterion.
AICc : The corrected Akaike information criterion reduces the probability of choosing a model with too many explanatory variables. This criterion would be more efficient than the AIC when the data set is small and/or has a large number of variables
SBC : Schwarz’s Bayesian Criterion. This criterion, proposed by Schwarz (1978) is similar to the AIC, and the aim is to minimize it.