Running a Correspondence Analysis (CA) from a raw data table with XLSTAT and plot a 3D representation with XLSTAT-3D Plot

Dataset for Correspondence Analysis (CA), 3D visualization XLS374 KB

Tutorial video
Correspondence Analysis (CA), 3D visualization is part of: Download Trial version More details See users' feedback
  • Pro Core statistical software

  • System configuration

    • Windows:
      • Versions: 9x/Me/NT/2000/XP/Vista/Win 7
      • Excel: 97 and later
      • Processor: 32 or 64 bits
      • Hard disk: 150 Mb
    • Mac OS X:
      • OS: OS X
      • Excel: X, 2004 and 2011
      • Hard disk: 150Mb.

Benefits

  • Easy and user-friendly
    Easy and user-friendly XLSTAT is flawlessly integrated with Microsoft Excel which is the most popular spreadsheet worldwide. This integration makes it one of the simplest available tools to work with as it utilizes the same philosophy as Microsoft Excel. The program is accessible in a dedicated XLSTAT tab. The analyses are grouped into functional menus. The dialog boxes are user-friendly and setting up an analysis is straightforward.
  • Data and results shared seamlessly
    Data and results shared seamlessly One of the greatest advantages of XLSTAT is the way you can share data and results seamlessly. As the results are stored in Microsoft Excel, anyone can access them. There is no need for the receiver to have an XLSTAT license or any additional viewer which makes your team-work easier and more affordable. In addition, results are easily integrable into other Microsoft Office software such as PowerPoint, so that you can create striking presentation in minutes.
  • Modular
    Modular XLSTAT is a modular product. XLSTAT-Pro is a core statistical module of XLSTAT which includes all the mainstream functionalities in statistics and multivariate analysis. More advanced features contained in add-on modules can be added for specific applications. This way you can adapt the software to your needs making the software more cost-efficient.
  • Didactic
    Didactic The results of XLSTAT are organized by analysis and are easy to navigate. Moreover useful information is provided along with the results to assist you in your interpretation.
  • Affordable
    Affordable XLSTAT is a complete and modular analytical solution that can suit any analytical business needs. It is very reasonably priced so that the return of your investment is almost immediate. Any XLSTAT license comes with top level support and assistance.
  • Accessible - Available in many languages
    Accessible - Available in many languages We have ensured XLSTAT is accessible to everyone by making the program available in many languages, including Chinese, English, French, German, Italian, Japanese, Polish, Portuguese and Spanish.
  • Automatable and customizable
    Automatable and customizable Most of the statistical functions available in XLSTAT can be called directly from the Visual Basic window of Microsoft Excel. They can be modified and integrated to more code to fit to the specificity of your domain. Adding tables and plots as well as modifying existing outputs becomes easy. Furthermore, XLSTAT includes some special tools on the dialog boxes to generate automatically the VBA code in order to reproduce your analysis using the VBA editor or to simply load pre-set settings. This effortless automation of routine analysis will be a huge time saver on your part.

Dataset for running a Correspondence Analysis from a raw data table

An Excel sheet with both the data and the results can be downloaded by clicking here. The data correspond to the list of foreigner soccer players in premier league and their nationality. We want to study the distribution of the foreign players in the English clubs.

Setting up a Correspondence Analysis from a raw data table

Once XLSTAT is open, select the Analyzing data / Correspondence analysis command, or click on the corresponding button of the Analyzing Data toolbar (see below).

barca1.gifbarca2.gif

Once you have clicked on the button, the Correspondence analysis dialog box appears.

In the field Observations/variables table, select the columns Club and Region on the Excel sheet.

The data are in an Observations/variables format, tick the corresponding option.

As the names of the columns are included, the Variable labels option should be selected as well.

Choose the Sheet option for the output.

ca2_1.png

On the tab Options tick the Test of independence and leave the significance level to 5.

ca2_2.png

In the Outputs section, select the following options:

  • Contingency table
  • Eigenvalues
  • Principal coordinates
  • Standard coordinates
  • Contributions
  • Squared cosines

ca2_3.png

Go to the last tab Charts and enable the:

  • Symmetric plots
  • Asymmetric plots
  • Labels

ca2_4.png

Click on OK.

As the model needs more than two factors. Click first on Select to select the plot F1-F2. Then change the Abscissa to F2. It will change the Ordinates to F3. Click again on Select. This way we will have two plots: F1-F2 and F2-F3. Click on Done.

ca2_5.png

Interpreting the results of a a Correspondence Analysis

The first result is the contingency table and then the test of independence between the rows and columns.

The p-value of 0.008 is inferior to 5% thus the null hypothesis should be rejected. This means that the distribution of nationality is not random in the UK clubs.

ca2_6.png

Then you have the symmetric plots. From the first plot you can see that the clubs such as Aston Villa and Stoke City have more North-American players than the rest of the teams. In the same way, Burney have a lot of Northern European players.

ca2_7.png

Creating a 3-D plot for the a Correspondence Analysis results

We will now do a plot in 3-dimensions to have a better representation of the points.

First we will make a table containing both the first 3 principal coordinates for the clubs and geographic areas and the sum of the cosines for those 3 factors.

The sum of the cosines for the 3 factors give an idea of how well is represented the sample in the 3-D space.

Add a last column to have the information about the rows and columns. The rows are the clubs and the columns the regions. Make a category variable with R and C to describe each sample.

ca2_7b.png

Select the full table and go to the menu Visualizing data and select the option XLSTAT-3DPlot.

When prompt select the format of your data as Table.

ca2_8.png

You will need to specify the axes. Do so by a right click and select in the dropbox the appropriate variable to use. For the 3 axes we utilize: F1, F2 horizontally and F3 vertically. You also need to set the size of the axis so as to have an orthonormal plot. For example use for all the axes : -1.5 and 1.5 as limits.

ca2_9.png

For the color and size of the dot you can use the sum of cosines. Go to the tab Objects and modify the color and size sections.

ca2_10.png

Finally we can add the labels by going into the tab Annotations and selcting "Column1" as the label.

ca2_11.png

Here is your 3-D representation.

ca2_12.png