Running a one-way ANOVA followed by multiple comparisons tests with XLSTAT

Dataset for ANOVA (Analysis of variance) XLS57.5 KB

Tutorial video
ANOVA (Analysis of variance) is part of: Download Trial version More details See users' feedback
  • Pro Core statistical software

  • System configuration

    • Windows:
      • Versions: 9x/Me/NT/2000/XP/Vista/Win 7
      • Excel: 97 and later
      • Processor: 32 or 64 bits
      • Hard disk: 150 Mb
    • Mac OS X:
      • OS: OS X
      • Excel: X, 2004 and 2011
      • Hard disk: 150Mb.

Benefits

  • Easy and user-friendly
    Easy and user-friendly XLSTAT is flawlessly integrated with Microsoft Excel which is the most popular spreadsheet worldwide. This integration makes it one of the simplest available tools to work with as it utilizes the same philosophy as Microsoft Excel. The program is accessible in a dedicated XLSTAT tab. The analyses are grouped into functional menus. The dialog boxes are user-friendly and setting up an analysis is straightforward.
  • Data and results shared seamlessly
    Data and results shared seamlessly One of the greatest advantages of XLSTAT is the way you can share data and results seamlessly. As the results are stored in Microsoft Excel, anyone can access them. There is no need for the receiver to have an XLSTAT license or any additional viewer which makes your team-work easier and more affordable. In addition, results are easily integrable into other Microsoft Office software such as PowerPoint, so that you can create striking presentation in minutes.
  • Modular
    Modular XLSTAT is a modular product. XLSTAT-Pro is a core statistical module of XLSTAT which includes all the mainstream functionalities in statistics and multivariate analysis. More advanced features contained in add-on modules can be added for specific applications. This way you can adapt the software to your needs making the software more cost-efficient.
  • Didactic
    Didactic The results of XLSTAT are organized by analysis and are easy to navigate. Moreover useful information is provided along with the results to assist you in your interpretation.
  • Affordable
    Affordable XLSTAT is a complete and modular analytical solution that can suit any analytical business needs. It is very reasonably priced so that the return of your investment is almost immediate. Any XLSTAT license comes with top level support and assistance.
  • Accessible - Available in many languages
    Accessible - Available in many languages We have ensured XLSTAT is accessible to everyone by making the program available in many languages, including Chinese, English, French, German, Italian, Japanese, Polish, Portuguese and Spanish.
  • Automatable and customizable
    Automatable and customizable Most of the statistical functions available in XLSTAT can be called directly from the Visual Basic window of Microsoft Excel. They can be modified and integrated to more code to fit to the specificity of your domain. Adding tables and plots as well as modifying existing outputs becomes easy. Furthermore, XLSTAT includes some special tools on the dialog boxes to generate automatically the VBA code in order to reproduce your analysis using the VBA editor or to simply load pre-set settings. This effortless automation of routine analysis will be a huge time saver on your part.

Dataset for running a one-way ANOVA

An Excel sheet with both the data and the results can be downloaded by clicking here.

The data correspond to an experiment where 4 new toothpaste formulas were each tested on 6 different patients in order to measure their effect on the whiteness of teeth. All patients had previously used the same toothpaste.

Goal of this tutorial

Using the ANOVA function of XLSTAT we want to find out if the results differ according to the formula used and, if so, which formula is the most effective. The case is a one-way balanced ANOVA because there is only one factor - the formula - and the number of repetitions is the same for each formula.

Setting up the one-way ANOVA

Once XLSTAT is open, select the XLSTAT / Modeling data / ANOVA command, or click on the corresponding button of the Modeling Data toolbar (see below).

barano1.gif

Once you have clicked on the button, the ANOVA dialog box appears.

Select the data on the Excel sheet. To the Dependent variable corresponds here "Whiteness" which variability we want to explain by the effect of the "Toothpaste" formula, the latter being the Qualitative explanatory variable.

As we selected the column title for both variables, we also clicked on the option Variable labels.

In this example we want to display the results on the same sheet where the data are stored, so we chose the Range option and selected the cell that corresponds to the top left corner of the results report to be displayed.

anova-one-factor-1.gif

In the Options tab, we left the constraint option at a1=0, meaning that we want the model to be built using the assumption that the T1 toothpaste has the basic effect on whiteness: we know the average for T1 is the lowest and this guarantees that the other effects will be positive.

Applying a constraint to the ANOVA model is necessary for theoretical reasons, but it has no effect on the results (goodness of fit, predictions). The only difference it makes is in the way you write the model.

anova-one-factor-2.gif

In the Outputs tab we activated the Pariwise comparisons option to be able to run a Tukey's test and a REGWQ test, we activated the Comparisons with a control option to run two sided Dunnett's test.

To understand more about the relatively complex subject of multiple comparison tests, we recommend the work by Jason C. Hsu.

anova-one-factor-3.gif

The computations begin once you have clicked on the OK button, then stop to ask you which group is the control group for the Dunnett test. We selected T1 as the control group.

anova-one-factor-4.gif

Once the user has clicked on the OK button, the computations resume and the results are displayed.

Interpreting the one-way ANOVA results

The first results displayed by XLSTAT are the goodness of fit coefficients, including the R² (coefficient of determination), the adjusted R² and several other statistis.

anova-one-factor-5.gif

The coefficient of determination (here 0.56) gives a fair idea of how much of the variability of the modeled variable (here the whiteness) is being explained by the explanatory variables (here the type of toothpaste); in our case we have 56% of the variability explained. The other 44% are hidden in other variables which are not available, and which the model hides in "random errors".

The analysis of variance table is a very important result to look at (see below). This is where we determine whether the explanatory variable (the toothpaste formula) brings significant information (null hypothesis H0) to the model or not. In other words, it is way of asking yourself whether it is valid to take the mean to describe the whole population, or if the information provided by the categories (here the toothpaste type) is of value or not.

anova-one-factor-6.gif

The test used here is the Fisher's F test. Given that the probability corresponding to the F value in this case is 0.001, it means that we would take a 0.1% risk to conclude that the null hypothesis (no effect of the toothpaste formulas) is wrong.

So we can conclude with confidence that there is an effect of the toothpaste formulas on the whiteness of the patients' teeth. Note that the R² is not very good (0.56), meaning that some of the information offering a complementary explanation of the variations of the whiteness is missing, which is no real surprise.

The following table gives details on the model. This table is helpful when predictions are needed. In this particular case it is not very useful. We can already notice that the toothpaste T2 has an effect which 95% confidence range includes 0, indicating that there is no evidence that T2 is very different from T1.

anova-one-factor-7.gif

The bar chart of the standardized coefficients allow to visually compare the relative impact of the categories, and to see if the confidence intervals include 0 or not.

anova-one-factor-8.gif

The next table shows the residuals. We can look at the reduced residuals (standardized residuals) more specifically, residuals which, given the assumptions of the ANOVA model, should be normally distributed. This means, among other things, that 95% of the residuals should be in the interval [-1.96, 1.96].

All values outside this interval are potential outliers, or might suggest that the normality assumption is wrong. It seems here that there is one strong outlier (13th observation) with a residual equal to -2.8279.

To explain the difference, one should first verify that the right toothpaste was given to the 13th patient, and secondly, one should try to understand why the response to the formula wasn't the same as for the other patients.

The histogram of the residuals allows to quickly visualize the residuals that are out of the expected range.

anova software

 

anova package

Now we obtain the answer to our initial question: is there a significant difference between the treatments, and how should this difference be classified?

As shown on the next table, the Tukey's HSD (Honestly Significantly Different) test is applied to all pairwise differences between means. The risk of 5% we have chosen is used to determine the critical value q, which is compared to the standardized difference between the means.

Only two pairs appear to be significantly different (T1, T3) and (T2,T3). The means and the categories are then classified based on this analysis. We can see here that there is no transitivity (> means not significantly different, and <> means significantly different): T4 > T3 T4 > T2 but T2 <> T3

anova-one-factor-11.gif

The REQWQ procedure gives different results (see below), which shows that one needs to be very cautious when using comparison methods.

Three pair of categories are different in this case (T1 and T4 appear to be significantly different with this method). The groupings give now three superimposed groups of categories.

anova-one-factor-12.gif

Next, we performed a Dunnett's test to compare each category with the control category T1. The Dunnett's test agrees with the REQWQ procedure that the T1 and T4 categories are significantly different.

anova-one-factor-13.gif

Conclusion for this one-way ANOVA

The conclusion is that the 4 toothpaste formulas show significantly different effects on whiteness. As the T1 toothpaste is already on the market, it is the toothpastes T3 or T4, which show a significant increase in whiteness, which should be selected as newcomers to the market.