CATA data analysis

CATA (check-all-that-apply) data analysis allows to analyze easy surveys made on consumers. It is available in Excel using the XLSTAT statistical software.

What is CATA (check-all-that-apply) analysis?

CATA (check-all-that-apply) surveys have become more and more popular for sensory product characterization since 2007, when it was presented by Adams et al. CATA surveys allow to focus on consumers, more representative of the market, instead of trained assessors. They are easy to set up and easy for participants to answer. The principle is that each assessor receives a questionnaire with attributes or descriptors that the respondent may feel, or not, that they apply to one or more products. If it does, he/she simply needs to check the attribute, otherwise he does not need to do anything. Other questions on different scales may be added to relate the attributes to preferences and liking scores. If participants are asked to give an overall rating to each product of the study, then further analyses and preference modelling is possible. Ares et al. (2014), recommend that the order of the CATA questions is randomized for each assessor to improve the reproducibility

Options of CATA analysis in XLSTAT

The CATA data analysis tool of XLSTAT has been developed to automate the analysis of CATA data. Let us consider that you have surveyed N assessors for P products (one of the products can be a virtual, often ideal, product) on K attributes. The CATA data for the K attributes are assumed to be recorded in a binary format (1 for checked, 0 for not checked). Two formats are currently accepted by XLSTAT:

For the first format, XLSTAT expects that you have in Excel, a table with P rows, and N groups of K columns all next to each other. You will then only need to specify the value N, from which XLSTAT will guess K. If you asked each assessor to give his preferences, you can add that column within each group of K columns at a position you can let XLSTAT know. In that case each group will have K+1 columns. If one of the products is an ideal product, you can specify its position.
For the second format, XLSTAT expects that you have in Excel, a table with P x N rows, and K columns. You will then need to select that table. In two additional fields, you need to select the product identifier and the assessor identifier. If you asked each assessor to rate the products, you need to select the column that corresponds to the preference data. If one of the products is an ideal product, you can specify its name so that XLSTAT identifies it.

The analyses performed by XLSTAT on CATA data are based on the article by Meyners et al. (2013) who investigated in depth the possibilities offered by CATA data.

Results of CATA analysis in XLSTAT

The first set of results corresponds to Cochran’s Q tests ran on the Assessors x Products table, independently for each attribute. The Cochran’s Q test allows the comparison of paired binary samples, which is what we have here with raw CATA data. This test allows to compare the different products. Pairwise comparisons based on the Marascuilo approach are performed. They can be used to identify the products responsible of a rejection of the null hypothesis that there is no difference between products. The Cochran’s Q test is equivalent to a McNemar test if there are only two products.

The second step of the CATA analysis corresponds to a correspondence analysis on the sum of the N individual CATA tables (the maximum value for each cell is N). The goal of that analysis is to position the products, including the ideal product, if available, on a map to analyze how the products are relatively positioned. The analysis can be based on the chi-square distance or the Hellinger distance (also known as the Bhattacharya distance, which is how it is referred to in the similarity/dissimilarity tool of XLSTAT). The analysis based on the Hellinger distance might be used when the dataset includes terms with low frequency (Meyners et al., 2013). If for some attributes the marginal sums are null, the corresponding attributes are removed from the correspondence analysis.

The next analysis uses the vertical data set, where you have one row per combination of product and assessor, one column per attribute, and where the ideal products, if available, have been removed. If available, a column of preference data can be included. XLSTAT computes on this data set the correlations between the attributes, using the tetrachoric correlation (well suited for binary data), and if the preference data are available, the biserial correlation coefficient between each attribute and the preference data. The biserial correlation coefficient has been developed to measure the correlation between a binary and a quantitative variable (see the Describing data / Biserial correlation tool for more information). To allow an optimal visualization of the different dimensions, XLSTAT computes a Principal Coordinate analysis, using the Lingoes correction if necessary. This method is preferred to the MDS because the coordinates are automatically rotated so that most information is carried by the first axis.

If preference data are available, the next step consists of penalty analyses. The penalty analyses aim at identifying if the attributes is checked, that will lead to a lower preference, no effect, or a higher preference.

The first results correspond to a set of K (one for each attribute) 2x2 tables, where on the left, you have the values recorded for the ideal product and at the top, the values obtained for the surveyed products. In the cells of the tables, you have the average preference (averaged over the assessors and the products) and the % of all records that correspond to this combination of 0s and/or 1s.

Ideal product\Products	0	1
0	6.2 (12%)	7.4 (8%)
1	5.1 (39%)	7.2 (41%)

From the results contained in these tables we can already make assumptions regarding how the attribute is considered by the assessors.For a given attribute, if it has been checked for the ideal product (second line of the table) and if the preference for the checked products (cell [1,1]) is significantly higher than the preference for the unchecked products (cell [1,0]), then the attribute is a must have.
On the other hand, if the attribute is not checked for the ideal product (first line of the table) and if the preference for the unchecked products (cell [0,0]) is significantly higher than the preference for the checked products (cell [0,1]), then the attribute is a must not have. If (cell [0,1]) > (cell [0,0]) significantly, then the attribute is nice to have. If the attribute is unchecked for the ideal product (first line of the table), that it’s neither a must not have nor a nice to have, and if the preference for the checked products (cell [0,1]) is comparable to the preference for the unchecked product (cell [0,0]), then the attribute does not harm.

XLSTAT considers two products being comparable if the absolute value of their difference is below 1.
Finally, if the attribute is not a must have and that the preference for the checked products (cell [1,1]) is comparable to the preference fo the unchecked products (cell [1,0]), the attribute does not influence. Some tables can correspond to the 3 situations. XLSTAT will try to associate each 2x2 table to a single situation by linking it to one of the above rules, respecting this order. Please note that to take a decision regarding an attribute, XLSTAT will check that the chosen threshold for population Size (Options 2 tab from the dialog box) is respected.

Last XLSTAT computes penalty tables and runs two penalty analyses in order to determine whether some attributes clearly must have or must not have attributes on a statistical point of view. These analyses allow to:

Test whether the differences between cells [1,1] and [1,0] are significant (“nice to have”) or not.
Test whether the differences between [0,0] and [0,1] are significant (“negative” if [0,0] > [0,1]) or not (“does not harm”).

View all tutorials