Discriminant Analysis (DA)

Discriminant Analysis (DA) is part of:
  • Pro Core statistical software

  • System configuration

    • Windows:
      • Versions: 9x/Me/NT/2000/XP/Vista/Win 7/Win 8
      • Excel: 97 and later
      • Processor: 32 or 64 bits
      • Hard disk: 150 Mb
    • Mac OS X:
      • OS: OS X
      • Excel: X, 2004 and 2011
      • Hard disk: 150Mb.

Benefits

  • Easy and user-friendly
    Easy and user-friendly XLSTAT is flawlessly integrated with Microsoft Excel which is the most popular spreadsheet worldwide. This integration makes it one of the simplest available tools to work with as it utilizes the same philosophy as Microsoft Excel. The program is accessible in a dedicated XLSTAT tab. The analyses are grouped into functional menus. The dialog boxes are user-friendly and setting up an analysis is straightforward.
  • Data and results shared seamlessly
    Data and results shared seamlessly One of the greatest advantages of XLSTAT is the way you can share data and results seamlessly. As the results are stored in Microsoft Excel, anyone can access them. There is no need for the receiver to have an XLSTAT license or any additional viewer which makes your team-work easier and more affordable. In addition, results are easily integrable into other Microsoft Office software such as PowerPoint, so that you can create striking presentation in minutes.
  • Modular
    Modular XLSTAT is a modular product. XLSTAT-Pro is a core statistical module of XLSTAT which includes all the mainstream functionalities in statistics and multivariate analysis. More advanced features contained in add-on modules can be added for specific applications. This way you can adapt the software to your needs making the software more cost-efficient.
  • Didactic
    Didactic The results of XLSTAT are organized by analysis and are easy to navigate. Moreover useful information is provided along with the results to assist you in your interpretation.
  • Affordable
    Affordable XLSTAT is a complete and modular analytical solution that can suit any analytical business needs. It is very reasonably priced so that the return of your investment is almost immediate. Any XLSTAT license comes with top level support and assistance.
  • Accessible - Available in many languages
    Accessible - Available in many languages We have ensured XLSTAT is accessible to everyone by making the program available in many languages, including Chinese, English, French, German, Italian, Japanese, Polish, Portuguese and Spanish.
  • Automatable and customizable
    Automatable and customizable Most of the statistical functions available in XLSTAT can be called directly from the Visual Basic window of Microsoft Excel. They can be modified and integrated to more code to fit to the specificity of your domain. Adding tables and plots as well as modifying existing outputs becomes easy. Furthermore, XLSTAT includes some special tools on the dialog boxes to generate automatically the VBA code in order to reproduce your analysis using the VBA editor or to simply load pre-set settings. This effortless automation of routine analysis will be a huge time saver on your part.

What is Discriminant Analysis?

Discriminant Analysis (DA) is an old method (Fisher, 1936) which in its classic form has changed little in the past twenty years. This method, which is both explanatory and predictive, can be used to:

is Discriminant Analysis may be used in numerous applications, for example in ecology and the prediction of financial risks (credit scoring).

Model for Discriminant Analysis: Linear or quadratic model

Two models of Discriminant Analysis are used depending on a basic assumption: if the covariance matrices are assumed to be identical, linear discriminant analysis is used. If, on the contrary, it is assumed that the covariance matrices differ in at least two groups, then the quadratic model is used. The Box test is used to test this hypothesis (the Bartlett approximation enables a Chi2 distribution to be used for the test). We start with linear analysis then, depending on the results from the Box test, carry out quadratic analysis if required.

Discriminant Analysis and Multicolinearity issues

With linear and still more with quadratic models, we can face problems of variables with a null variance or multicolinearity between variables. XLSTAT has been programmed so as to avoid these problems. The variables responsible for these problems are automatically ignored either for all calculations or, in the case of a quadratic model, for the groups in which the problems arise. Multicolinearity statistics are optionally displayed so that you can identify the variables which are causing problems.

Discriminant Analysis and variable selection

As for linear and logistic regression, efficient stepwise methods have been proposed. They can, however, only be used when quantitative variables are selected as the input and output tests on the variables assume them to be normally distributed. The stepwise method gives a powerful model which avoids variables which contribute only little to the model.

Discriminant Analysis results: Classification table, ROC curve and cross-validation

Among the numerous results provided, XLSTAT can display the classification table (also called confusion matrix) used to calculate the percentage of well-classified observations. When only two classes (or categories or modalities) are present in the dependent variable, the ROC curve may also be displayed. The ROC curve (Receiver Operating Characteristics) displays the performance of a model and enables a comparison to be made with other models. The terms used come from signal detection theory.

The proportion of well-classified positive events is called the sensitivity. The specificity is the proportion of well-classified negative events. If you vary the threshold probability from which an event is to be considered positive, the sensitivity and specificity will also vary. The curve of points (1-specificity, sensitivity) is the ROC curve. Let's consider a binary dependent variable which indicates, for example, if a customer has responded favorably to a mail shot. In the diagram below, the blue curve corresponds to an ideal case where the n% of people responding favorably corresponds to the n% highest probabilities. The green curve corresponds to a well-discriminating model. The red curve (first bisector) corresponds to what is obtained with a random Bernoulli model with a response probability equal to that observed in the sample studied. A model close to the red curve is therefore inefficient since it is no better than random generation. A model below this curve would be disastrous since it would be less even than random.

The area under the curve (or AUC) is a synthetic index calculated for ROC curves. The AUC corresponds to the probability such that a positive event has a higher probability given to it by the model than a negative event. For an ideal model, AUC=1 and for a random model, AUC = 0.5. A model is usually considered good when the AUC value is greater than 0.7. A well-discriminating model must have an AUC of between 0.87 and 0.9. A model with an AUC greater than 0.9 is excellent.

The results of the model as regards forecasting may be too optimistic: we are effectively trying to check if an observation is well-classified while the observation itself is being used in calculating the model. For this reason, cross-validation was developed: to determine the probability that an observation will belong to the various groups, it is removed from the learning sample, then the model and the forecast are calculated. This operation is repeated for all the observations in the learning sample. The results thus obtained will be more representative of the quality of the model. XLSTAT gives the option of calculating the various statistics associated with each of the observations in cross-validation mode together with the classification table and the ROC curve if there are only two classes.

Lastly, you are advised to validate the model on a validation sample wherever possible. XLSTAT has several options for generating a validation sample automatically.

Discriminant analysis and logistic regression

Where there are only two classes to predict for the dependent variable, discriminant analysis is very much like logistic regression. Discriminant analysis is useful for studying the covariance structures in detail and for providing a graphic representation. Logistic regression has the advantage of having several possible model templates, and enabling the use of stepwise selection methods including for qualitative explanatory variables. The user will be able to compare the performances of both methods by using the ROC curves.

Tutorials

Screenshots