Customizing a PCA chart with XLSTAT to make it easier to interpret

Dataset for Plot management XLS209 KB

Tutorial video
  • Pro Core statistical software

  • System configuration

    • Windows:
      • Versions: 9x/Me/NT/2000/XP/Vista/Win 7
      • Excel: 97 and later
      • Processor: 32 or 64 bits
      • Hard disk: 150 Mb
    • Mac OS X:
      • OS: OS X
      • Excel: X, 2004 and 2011
      • Hard disk: 150Mb.

Benefits

  • Easy and user-friendly
    Easy and user-friendly XLSTAT is flawlessly integrated with Microsoft Excel which is the most popular spreadsheet worldwide. This integration makes it one of the simplest available tools to work with as it utilizes the same philosophy as Microsoft Excel. The program is accessible in a dedicated XLSTAT tab. The analyses are grouped into functional menus. The dialog boxes are user-friendly and setting up an analysis is straightforward.
  • Data and results shared seamlessly
    Data and results shared seamlessly One of the greatest advantages of XLSTAT is the way you can share data and results seamlessly. As the results are stored in Microsoft Excel, anyone can access them. There is no need for the receiver to have an XLSTAT license or any additional viewer which makes your team-work easier and more affordable. In addition, results are easily integrable into other Microsoft Office software such as PowerPoint, so that you can create striking presentation in minutes.
  • Modular
    Modular XLSTAT is a modular product. XLSTAT-Pro is a core statistical module of XLSTAT which includes all the mainstream functionalities in statistics and multivariate analysis. More advanced features contained in add-on modules can be added for specific applications. This way you can adapt the software to your needs making the software more cost-efficient.
  • Didactic
    Didactic The results of XLSTAT are organized by analysis and are easy to navigate. Moreover useful information is provided along with the results to assist you in your interpretation.
  • Affordable
    Affordable XLSTAT is a complete and modular analytical solution that can suit any analytical business needs. It is very reasonably priced so that the return of your investment is almost immediate. Any XLSTAT license comes with top level support and assistance.
  • Accessible - Available in many languages
    Accessible - Available in many languages We have ensured XLSTAT is accessible to everyone by making the program available in many languages, including Chinese, English, French, German, Italian, Japanese, Polish, Portuguese and Spanish.
  • Automatable and customizable
    Automatable and customizable Most of the statistical functions available in XLSTAT can be called directly from the Visual Basic window of Microsoft Excel. They can be modified and integrated to more code to fit to the specificity of your domain. Adding tables and plots as well as modifying existing outputs becomes easy. Furthermore, XLSTAT includes some special tools on the dialog boxes to generate automatically the VBA code in order to reproduce your analysis using the VBA editor or to simply load pre-set settings. This effortless automation of routine analysis will be a huge time saver on your part.

Dataset for customizing the plot

An Excel sheet with both the data and results used in this tutorial can be downloaded by clicking here. This tutorial is based on results obtained in the tutorial on PCA (Principal Components Analysis). Our goal is to improve the readability of the graphical representation on axes F1 and F2.

Customizing a plot

We first make a copy of the representation, and then enlarge it. It can be observed that during the expansion of the chart, some labels are moved away from the point to which they correspond. To remedy this, we select the graph and then we use the tool Reposition labels of the Visualizing data toolbar, and choose the following options :

easy1.gif

We then create, to the right of the table of the factor scores, a column that contains the sum of the squared cosines on the first two axes for each observation.

As a reminder, for a given axis and a given observation, the cosine is the cosine of the angle between the axis and the vector going from the origin to the point. Thus, the greater the cosine, the closer the point is to the axis in the multidimensional space resulting from the PCA. The sum of the cosines on the first two factorial axes F1 and F2 for any given observation, gives an idea of the accuracy of the plane defined by F1 and F2, for this observation. For a given observation, the sum of the squared cosines over all axes is 1. So, for a given point, the closer the sum to 1, the greater the interpretability of the representation.

In order to indicate the level of interpretability of the two-dimensional representation for the various points, we want to increase the point sizes according to the value of the sum of the squared cosines. This will allow us to know which points can be interpreted without error.

Furthermore, to differentiate the five groups of States determined by the Census Bureau (North East, South, Midwest, West and Pacific), we will use different shapes.

To modify the shapes, we need to use the codes as defined by XLSTAT, the later respecting the order of shapes proposed by Excel (see the dialog box below): 1 corresponds to a square, 2 to a diamond, 3 to a triangle, 4 to an x, 5 to a star, 6 to a point, 7 to a -, 8 to a + and 9 to a circle. As only four shapes are effectively usable, the States of Hawaï and Alaska that belong to the Pacific zone will be represented with a circle as the western States, but with a black contour.

easy3.gif

We then create a column that contains the codes corresponding to each State.

To increase the readability of the chart, we are going to color in red the points for which the sum of squared cosines is greater than 0.8. To change the color of the points, we must apply the colors to the cells where the shapes are defined. We first color the entire column of cosines in blue. Then we use the DataFlagger tool available in the "Tools" toolbar to color in red the cells with a sum greater than or equal to 0.8.

To surround with black the points corresponding to Hawaii and Alaska, a black bottom border has been added to the cells. The format of the cell is then copied and pasted into the column with the shapes, and we clear the formats in the column with the squared cosines (Excel / Edit / Clear formats).

easy2.gif

We then select the graphic, and then launch the EasyPoints tool that is available in the "Visualizing data" toolbar. The following options were chosen:

easy4.gif

As a result, we obtain the following chart:

easy5.gif

Easier to interpret, this chart allows us to identify the states which can be interpreted in terms of proximity. For example, one can conclude that West Virginia and Pennsylvania are close, while Pennsylvania and Alaska are very different. Furthermore, we note that in the top right and bottom right of the representation, we mostly find Western States.