CATA data analysis
CATA (check-all-that-apply) data analysis allows to analyze easy surveys made on consumers. It is available in Excel using the XLSTAT statistical software.
What is CATA (check-all-that-apply) analysis?
CATA (check-all-that-apply) surveys have become more and more popular for sensory product characterization since 2007, when it was presented by Adams et al. CATA surveys allow to focus on consumers, more representative of the market, instead of trained assessors. They are easy to set up and easy for participants to answer. The principle is that each assessor receives a questionnaire with attributes or descriptors that the respondent may feel, or not, that they apply to one or more products. If it does, he/she simply needs to check the attribute, otherwise he does not need to do anything. Other questions on different scales may be added to relate the attributes to preferences and liking scores. If participants are asked to give an overall rating to each product of the study, then further analyses and preference modelling is possible. Ares et al. (2014), recommend that the order of the CATA questions is randomized for each assessor to improve the reproducibility
CATA analysis in XLSTAT
The CATA data analysis tool of XLSTAT has been developed to automate the analysis of CATA data. Let us consider that you have surveyed N assessors for P products (one of the products can be a virtual, often ideal, product) on K attributes. The CATA data for the K attributes are assumed to be recorded in a binary format (1 for checked, 0 for not checked). Two formats are currently accepted by XLSTAT:
- For the first format, XLSTAT expects that you have in Excel, a table with P rows, and N groups of K columns all next to each other. You will then only need to specify the value N, from which XLSTAT will guess K. If you asked each assessor to give his preferences, you can add that column within each group of K columns at a position you can let XLSTAT know. In that case each group will have K+1 columns. If one of the products is an ideal product, you can specify its position.
- For the second format, XLSTAT expects that you have in Excel, a table with P x N rows, and K columns. You will then need to select that table. In two additional fields, you need to select the product identifier and the assessor identifier. If you asked each assessor to rate the products, you need to select the column that corresponds to the preference data. If one of the products is an ideal product, you can specify its name so that XLSTAT identifies it.
The analyses performed by XLSTAT on CATA data are based on the article by Meyners et al. (2013) who investigated in depth the possibilities offered by CATA data.
CATA analysis in XLSTAT - Results
The first set of results corresponds to Cochran’s Q tests ran on the Assessors x Products table, independently for each attribute. The Cochran’s Q test allows the comparison of paired binary samples, which is what we have here with raw CATA data. This test allows to compare the different products. Pairwise comparisons based on the Marascuilo approach are performed. They can be used to identify the products responsible of a rejection of the null hypothesis that there is no difference between products. The Cochran’s Q test is equivalent to a McNemar test if there are only two products.
The second step of the CATA analysis corresponds to a correspondence analysis on the sum of the N individual CATA tables (the maximum value for each cell is N). The goal of that analysis is to position the products, including the ideal product, if available, on a map to analyze how the products are relatively positioned. The analysis can be based on the chi-square distance or the Hellinger distance (also known as the Bhattacharya distance, which is how it is referred to in the similarity/dissimilarity tool of XLSTAT). The analysis based on the Hellinger distance might be used when the dataset includes terms with low frequency (Meyners et al., 2013). If for some attributes the marginal sums are null, the corresponding attributes are removed from the correspondence analysis.
The next analysis uses the vertical data set, where you have one row per combination of product and assessor, one column per attribute, and where the ideal products, if available, have been removed. If available, a column of preference data can be included. XLSTAT computes on this data set the correlations between the attributes, using the tetrachoric correlation (well suited for binary data), and if the preference data are available, the biserial correlation coefficient between each attribute and the preference data. The biserial correlation coefficient has been developed to measure the correlation between a binary and a quantitative variable (see the Describing data / Biserial correlation tool for more information). To allow an optimal visualization of the different dimensions, XLSTAT computes a Principal Coordinate analysis, using the Lingoes correction if necessary. This method is preferred to the MDS because the coordinates are automatically rotated so that most information is carried by the first axis.
If preference data are available, the next step consists of penalty analyses. The penalty analyses aim at identifying if the attributes is checked, that will lead to a lower preference, no effect, or a higher preference.
The first results correspond to a set of K (one for each attribute) 2x2 tables, where on the left, you have the values recorded for the ideal product and at the top, the values obtained for the surveyed products. In the cells of the tables, you have the average preference (averaged over the assessors and the products) and the % of all records that correspond to this combination of 0s and/or 1s.
From the results contained in these tables we can already make assumptions regarding how the attribute is considered by the assessors. For a given attribute, if the attribute is checked for the ideal product (second row), then if the preference for the products that are checked (cell [1,1]) is higher than when it is not checked (cell [1,0]), then the attribute is a must have. Symmetrically, if the attribute is not checked for the ideal product (first row), then if the preference for the products that are not checked (cell [0,0]) is higher than when it is checked (cell [0,1]), then the attribute is a must not have. If the attribute is not checked for the ideal product (first row), and if the preference for the products that are checked (cell [0,1]) is about the same (in XLSTAT we have set this as an absolute difference less than one) as when it is not checked (cell [0,0]), then the attribute is a does not harm. Finally, if (cell [0,1]) > (cell [0,0]), then the attribute is a nice to have. Some tables could correspond to 3 cases. XLSTAT will only associate each table to one case, but you might want to control the results. XLSTAT will try to relate each 2x2 table to one of the rules defined above in the same order.
Last XLSTAT computes penalty tables and runs two penalty analyses in order to determine whether some attributes clearly must have or must not have attributes on a statistical point of view. These analyses allow to:
- Test whether the differences between cells [1,1] and [1,0] are significant (“nice to have”) or not.
- Test whether the differences between [0,0] and [0,1] are significant (“negative” if [0,0] > [0,1]) or not (“does not harm”).