ANOVA (Analysis of variance)

Use this model to carry out ANOVA (ANalysis Of VAriance) of one or more balanced or unbalanced factors. Available in Excel with the XLSTAT software.

Principles of the Analysis of Variance

Analysis of variance (ANOVA) is a tool used to partition the observed variance in a particular variable into components attributable to different sources of variation.

Analysis of variance (ANOVA) uses the same conceptual framework as linear regression. The main difference comes from the nature of the explanatory variables: instead of quantitative, here they are qualitative. In ANOVA, explanatory variables are often called factors.

ANOVA model

If p is the number of factors, the ANOVA model is written as follows:

yi = β0 + ∑j=1...q βk(i,j),j + εi

where yi is the value observed for the dependent variable for observation i, k(i,j) is the index of the category (or level) of factor j for observation i and εi is the error of the model.

The chart below shows data that could be analyzed using a 1-factor ANOVA. The factor has three categories. Data are orange points. The dashed green line is the grand mean and the short green lines are category averages. Note that we use arbitrarily the sum(ai)=0 constraint, which means that βcorresponds to the grand mean. 

The hypotheses used in ANOVA are identical to those used in linear regression: the errors εi follow the same normal distribution N(0,s) and are independent. It is recommended to check retrospectively that the underlying hypotheses have been correctly verified. The normality of the residues can be checked by analyzing certain charts or by using a normality test. The independence of the residues can be checked by analyzing certain charts or by using the Durbin Watson test.  

XLSTAT allows to correct for heteroscedasticity and autocorrelation that can arise using several methods such as the estimator suggested by Newey and West (1987).

Options for ANOVA in XLSTAT

  • XLSTAT enables you to perform one and multiple way ANOVA. Interactions up to order 4 can be included in the model as well as nested and random effects.
  • XLSTAT can handle both balanced and unbalanced ANOVA.
  • XLSTAT has an automatic device to find nested factors and one nested factor can be included in the model.
  • Random factors can be included in an ANOVA. When some factors are supposed to be random, XLSTAT displays the expected mean squares table.
  • Four methods are proposed for model selection: Best model, Stepwise, Forward, Backward
  • Test assumptions: a Shapiro-Wilk test is performed on the residuals. A Levene's test is available to run a test on the homogeneity of variances. The test is run to compare for each factor, the variance of the different categories.

Results for the analysis of variance in XLSTAT

The results given are a residuals analysis, parameters of the models, the model equation, the standardized coefficients, Type I SS, Type III SS, and predictions are displayed.

In addition several multiple comparison methods can optionally be performed: Bonferroni's and Dunn-Sidak corrected t test, Tukey's HSD test, Fisher's LSD test, Duncan's test, Newman-Keuls' (SNK) method and the REGWQ method. Also the Dunnett's test is available to allow users to perform multiple comparisons with control (MCC) and Multiple comparison with the best (MCB).

Moreover, influence diagnostics are displayed for each observation including the residual, the standardized residual (divided by the RMSE), the studentized residual, the deleted residual, the studentized deleted residual, thee leverage, the Mahalanobis distance, the Cook's D, the CovRatio, the DFFit, the standardized DFFit, the DFBetas (one per model coefficient) and the standardized DFBetas.

Among the proposed charts in the analysis of variance in XLSTAT

  • Standardized coefficients chart: The barchart displays the values and confidence intervals of the standardized coefficients.

  • Regression chart: The chart shows the observed values, the regression line and both types of confidence interval around the predictions.
  • Standardized residuals as a function of model predictions: In principle, the residuals should be distributed randomly around the X-axis. If there is a trend or a shape, this shows a problem with the model.

  • The distance between the predictions and the observations: For an ideal model, the points would all be on the bisector.
  • The standardized residuals on a bar chart: The last chart quickly shows if an abnormal number of values are outside the interval]-2, 2[ given that the latter, assuming that the sample is normally distributed, should contain about 95% of the data.

  • Summary charts: If several dependent variables have been selected and if the multiple comparisons option has been activated, a chart allows to visualize estimated means with multiple comparisons grouping letters.