XLSTAT Zoom: How to become a PCA guru, with XLSTAT
Did you know that Principal Component Analysis (PCA) is one of the most widely used XLSTAT features?
It’s not surprising as this data mining method is extensively used in marketing, biostatistics, sociology, and many other fields. With XLSTAT’s powerful algorithms, you’ll have more than 15 output tables and graphs generated automatically by XLSTAT, while more than 20 options are available to customize your PCA. But don’t worry, it’s very easy to master!
Let’s start with an easy example!
We have some data from the US Census Bureau which describe the changes in the population of 51 states between 2019 and 2020. Using PCA, we can analyze the correlations between the variables and to find out if the changes in population in some states are very different from the ones in others.
Ready for some quick theory?
Often datasets hold a lot of interesting information you want to look at. This information is likely to be spread in a homogeneous way. PCA will build an artificial dataset, with the same number of columns as the original dataset. It will then maximize the amount of information in the first few columns of the artificial data it generated.
In PCA jargon, columns created by PCA are called dimensions, or factors, or axes. Information is called inertia or variability.
For more theory, check out our video:
Among the main purposes of PCA are:
- Investigating the relationships between variables (e.g. products)
- Checking how individuals are described by variables (e.g. consumers)
- Examining proximities among individuals
- Obtain a new dataset appropriate to use in modeling methods such as linear regression
How to build a PCA in XLSTAT?
Five simple steps:
- Import your data into Excel
- Launch the PCA dialog box under Analyzing data menu
- Select your data
- Choose the options/output of your interest
- Click OK to run the PCA
An easy example is available on our Help Center.
Which PCA options can you find in XLSTAT?
- Input data: Observations/variables table, correlation matrix and covariance matrix are the three types of data format you can choose. For example, if your data correspond to a table comprising q observations described by p quantitative variables, select the Observations/variables option
- PCA type: You can choose between Correlation matrix (standardized or normalized PCA), Covariance matrix (unstandardized or non-normalized PCA) and Spearman correlation matrix.
- Standardization: If the format of your data is "observations/variables", you can choose how correlation (or covariance) will be computed: with denominator (n) or (n – 1).
- Filtering factors: Reduce the number of factors for which results are displayed by choosing either the Minimum% option (minimum percentage of the total variability that the chosen factors must represent) or Maximum number of factors option.
- Rotation: Apply a rotation (Varimax, Promax, etc) to the factor coordinate matrix.
- Add supplementary variables and observations. These observations/variables are not taken into account for the computation of the correlation matrix and for the subsequent calculations (we talk of passive observations as opposed to active observations).
- Pairwise deletion: Remove observations only when the variables involved in the calculations have missing data.
- By group analysis: Perform one PCA per group, per selected group or on merged groups.
- Hierarchical Ascending Classification: You can automatically open the pre-filled dialog box of HAC and perform a classification of the observations on the factor scores tables.
- Results: Descriptive statistics, correlation/covariance matrix, Bartlett's sphericity test, KMO measure, eigenvalues, contributions, squared cosines, factor scores,
- Graphs: Correlation chart, biplot (correlation, distance, asymmetric), observations chart, bootstrap charts.
No tweet to display