Running a Canonical Correspondence Analysis (CCA) with XLSTAT-ADA

Dataset for Canonical Correspondence Analysis (CCA and partial CCA) XLS72.5 KB

Tutorial video
Canonical Correspondence Analysis (CCA and partial CCA) is part of: Download Trial version More details See users' feedback
  • ADA Advanced Data Analysis on Multiple tables software

  • System configuration

    • Windows:
      • Versions: 9x/Me/NT/2000/XP/Vista/Win 7
      • Excel: 97 and later
      • Processor: 32 or 64 bits
      • Hard disk: 150 Mb
    • Mac OS X:
      • OS: OS X
      • Excel: X, 2004 and 2011
      • Hard disk: 150Mb.

Benefits

  • Easy and user-friendly
    Easy and user-friendly XLSTAT is flawlessly integrated with Microsoft Excel which is the most popular spreadsheet worldwide. This integration makes it one of the simplest available tools to work with as it utilizes the same philosophy as Microsoft Excel. The program is accessible in a dedicated XLSTAT tab. The analyses are grouped into functional menus. The dialog boxes are user-friendly and setting up an analysis is straightforward.
  • Data and results shared seamlessly
    Data and results shared seamlessly One of the greatest advantages of XLSTAT is the way you can share data and results seamlessly. As the results are stored in Microsoft Excel, anyone can access them. There is no need for the receiver to have an XLSTAT license or any additional viewer which makes your team-work easier and more affordable. In addition, results are easily integrable into other Microsoft Office software such as PowerPoint, so that you can create striking presentation in minutes.
  • Modular
    Modular XLSTAT is a modular product. XLSTAT-Pro is a core statistical module of XLSTAT which includes all the mainstream functionalities in statistics and multivariate analysis. More advanced features contained in add-on modules can be added for specific applications. This way you can adapt the software to your needs making the software more cost-efficient.
  • Didactic
    Didactic The results of XLSTAT are organized by analysis and are easy to navigate. Moreover useful information is provided along with the results to assist you in your interpretation.
  • Affordable
    Affordable XLSTAT is a complete and modular analytical solution that can suit any analytical business needs. It is very reasonably priced so that the return of your investment is almost immediate. Any XLSTAT license comes with top level support and assistance.
  • Accessible - Available in many languages
    Accessible - Available in many languages We have ensured XLSTAT is accessible to everyone by making the program available in many languages, including Chinese, English, French, German, Italian, Japanese, Polish, Portuguese and Spanish.
  • Automatable and customizable
    Automatable and customizable Most of the statistical functions available in XLSTAT can be called directly from the Visual Basic window of Microsoft Excel. They can be modified and integrated to more code to fit to the specificity of your domain. Adding tables and plots as well as modifying existing outputs becomes easy. Furthermore, XLSTAT includes some special tools on the dialog boxes to generate automatically the VBA code in order to reproduce your analysis using the VBA editor or to simply load pre-set settings. This effortless automation of routine analysis will be a huge time saver on your part.

Canonical Correspondence Analysis

Canonical Correspondence Analysis (CCA) has been developed to allow ecologists to relate the abundance of species to environmental variables (Ter Braak, 1986). However, this method can be used in other domains. Geomarketing and demographic analyses should be able to take advantage of it.

In order to use Canonical Correspondence Analysis, one needs:

  • A contingency table X that contains the frequencies of a series of objects (in ecology species), on the several sites where they are counted,
  • A table Y of descriptive variables that are measured on the same sites
  • Optionally a third table Z that contains descriptive information which effect we want to remove before trying to explain the variability within X using Y. In this case, the method is called partial Canonical Correspondence Analysis.

The goal is to produce a map where the objects, the sites, and the variables are represented.

Dataset for running a Canonical Correspondence Analysis

An Excel sheet with both the data and the results can be downloaded by clicking here.

The data correspond to the counts of 10 species of insects on 12 different sites in a tropical region. A second table (displayed in red color) includes 3 quantitative variables that describe the 12 sites (altitude, humidity, and distance to the lake).

Goal of this Canonical Correspondence Analysis

Our goal is to determine if the three descriptive variables can help us to explain the frequencies of the species of insects.

Setting up a Canonical Correspondence Analysis

To activate the Canonical Correspondence Analysis dialog box, start XLSTAT, then select the XLSTAT-ADA / Canonical Correspondence Analysis command in the XLSTAT menu, or click on the corresponding button of the XLSTAT-ADA toolbars (see below).

barcca.gif

Once you have clicked on the button, the dialog box appears. Select the data that correspond to the sites/species data (here objects are species), and then the sites/variables data (displayed in red color on the Excel sheet).

We also select the sites labels, and make sure the Column labels option is checked so that XLSTAT knows that we included the headers in the selection.

cca1.gif

We activate the Options tab, and check the Permutation test option in order to test if the effect of the three variables on the observed frequencies of the insects is significant or not. We decide to run 1000 random permutations.

On the two images below, you can see which options have been selected for Outputs and Charts.

cca2.gifcca3.gif

After you have clicked on the OK button, the computations start and the results are displayed on a new Excel sheet.

Interpreting the results of a Canonical Correspondence Analysis

The first set of results corresponds to the descriptive statistics of the various variables. The row and column profiles of the contingency table are displayed. The contingency table corresponds here to the frequencies of insects at each site. The "weighted averages" correspond to the means of the variables of the second table, weighted by the marginal sums of the rows of the first table.

Next we see the results of the permutation test.

cca5.gifcca6.gif

The test concludes that the sites/species data are not linearly related to the sites/variables data with 5% significance level. Looking closer, we see that the p-value is just above the threshold we had chosen (0.05 against 0.089). So the conclusion might not be as obvious. Furthermore, we are interested in checking if this is true for all variables, or if some variables seem to explain the results better than other.

The next table displays how the inertia is spread between the constrained Canonical Correspondence Analysis (the analysis that uses the explanatory variables) and the unconstrained Canonical Correspondence Analysis (the unconstrained Canonical Correspondence Analysis is a Correspondence Analysis of the residuals of the constrained Canonical Correspondence Analysis).

cca7.gif

The next table displays how the inertia is spread between the constrained Canonical Correspondence Analysis (the analysis that uses the explanatory variables) and the unconstrained Canonical Correspondence Analysis (the unconstrained Canonical Correspondence Analysis is a Correspondence Analysis of the residuals of the constrained Canonical Correspondence Analysis). We see here that to the constrained Canonical Correspondence Analysis corresponds only 40% of the inertia. So a look at the results of the unconstrained Canonical Correspondence Analysis would make sense, and the relation between the sites and the species should not be analyzed too much in depth here. However, to make the tutorial shorter, we will focus here only on the constrained Canonical Correspondence Analysis results (named simply Canonical Correspondence Analysis results on the report).

Within the Canonical Correspondence Analysis eigenvector analysis, we see that most of the inertia is carried by the first axis. With the second axis we obtain 92.5% of the inertia. This means that the two-dimensional Canonical Correspondence Analysis map will be enough to analyze the relationships between the sites, the species and the variables.

cca8.gif

The Canonical Correspondence Analysis map (see below) allows to simultaneously visualize the objects (here the insect species), the sites, and the variables.

cca9.gif

We see on the map that for the species Insect4 and Insect5 the frequency is associated with a high humidity and a low Altitude. Insect7 seems to be more sensitive to the distance to the lake. Insect9 seems to prefer a higher altitude, or more likely a lower humidity.

Note: if you want to change the “Objects” to “Species” on the Canonical Correspondence Analysis map, all you need to do is click on one of the points of the corresponding series, and then change "Objects" to "Species" in the Excel formula bar.

cca10.gif