# CLUSCATA

Use CLUSCATA to build homogeneous classes of judges based on their perceptions of products. Available in Excel with the XLSTAT software.

CATA tasks are widely used nowadays. However, product perceptions often differ among assessors. A cluster analysis of the assessors may therefore be necessary. The CLUSCATA method fits into this context. Moreover, this strategy allows to put aside the assessors who do not fir any pattern of classes. CLUSCATA can be seen as an adaptation of CLUSTATIS for CATA data.

### Principle of CLUSCATA

The objective of CLUSCATA is to constitute classes of assessors as homogeneous as possible, each class of assessors being represented by a latent table (called consensus) determined by CATATIS. It is therefore natural that each class is finally analyzed by CATATIS, in order to determine the differences between the constituted classes. CLUSCATA consists of a hierarchical algorithm that can be "consolidated" by a partitioning algorithm (*i.e.* the partitioning algorithm is initialized by cutting the dendrogram). An interesting option is the creation of a "K+1" class (corresponding to an additional class) in order to set aside assessors who do not conform to any class. An assessor will be placed in this class if the similarities (Ochiai coefficients) between the consensus of each class and this assessor are all considered weak.

## Options of CLUSCATA in XLSTAT

### Structure of the data

There are two different formats:

All the data are merged horizontally (horizontal format).

All the data are merged vertically (vertical format).

For data entry, XLSTAT asks you to select all the data, and to give the format type. In the case of the vertical format, product and assessor labels are mandatory.

### Interpreting the results

The representation of the products and attributes in the space of *k *factors allows to visually interpret the proximities between the products and attributes, by means of precautions.

We can consider that the projection of a product or an attribute on a plan is reliable if it is far from the center of the graph.

### Number of factors

Two methods are commonly used to determine how many factors must be retained for the interpretation of the results:

- Watch the decreasing curve of eigenvalues. The number of factors to be kept corresponds to the first turning point found on the curve.

- We can also use the cumulative variability percentage represented by the factor axes and decide to use only a certain percentage.

### Graphic representations

The graphical representations of the objects in each class are only reliable if the sum of the variability percentages associated with the axes of the representation space are sufficiently high. If this percentage is high (for example 80%), the representation can be considered as reliable. If the percentage is low, it is recommended to produce representations on several axis pairs in order to validate the interpretation made on the two first factor axes.

### Quality of a cluster analysis

In order to determine the quality of a hierarchical clustering, one can use the increase in within-class variance (CLUSCATA criterion error) caused by the merging of two classes. This increase is equal to the height of the dendrogram in which the two classes of assessors are grouped in the same class.

The homogeneity of each class and the global homogeneity are also very important indices (between* 1/m* and 1, *m* being the number of assessors) which allow to judge the quality of the cluster analysis. It should be noted that the consolidation and the addition of a class "K+1" can increase homogeneities.

## Results of CLUSCATA in XLSTAT

**Descriptive statistics**: The number of checks by assessor is displayed.

**Similarity matrix (S)**: The matrix of similarity index between all assessors is displayed. The similarity index is included between 0 and 1. The closer it is to 1, the stronger the similarity. This index is the Ochiai coefficient.

**Node statistics**: This table shows the data for the successive nodes in the dendrogram. The first node has an index which is the number of assessors increased by 1. Thus, it is easy to see at any time if an assessor or group of assessors is clustered with another group of assessors in the dendrogram.

**Levels bar chart**: This table displays the statistics for dendrogram nodes, which correspond to the increase in the CLUSCATA minimization criterion (equivalent to the increase in within-class variance) when merging two classes.

**Dendrograms**: The full dendrogram displays the progressive clustering of assessors. If truncation has been requested, a broken line marks the level the truncation has been carried out. The truncated dendrogram shows the classes after truncation.

#### Composition of classes:

**Results by assessor**: This table shows the assignment class for each assessor in the initial assessors order. If a consolidation is requested, the results are given before and after the consolidation. If you have checked "class K+1", it is possible that some assessor may have a missing value after consolidation. This means that they are not placed in any of the main classes (they are placed in class "K+1").

**Results by class**: The results are given by class. Thus, a list of assessors is displayed for each class.

**Number of assessors per class**: The number of assessors in each class is indicated.

**Rho parameter computed**: Result displayed only if you have chosen to add the class "K+1". The rho parameter represents the minimum similarity that an assessor must have with the consensus of a class in order to belong to it. If this condition is not met for any of the classes, the assessor is placed in class "K+1". This parameter is calculated according to the proximity of each assessor to its class as well as to the neighboring class.

#### Analysis of the class k:

In this section, the analysis of each of the classes by the CATATIS method is displayed.

**Eigenvalues of CA**: The eigenvalues of CA and corresponding chart (*scree plot* ) are displayed.

**Product coordinates**: The coordinates of the products of the consensus in the factors space are displayed, with the corresponding charts (depending on the number of factors chosen).

**Attribute coordinates**: The coordinates of the attributes of the consensus in the factors space are displayed, with the corresponding charts (depending on the number of factors chosen).

**Consensus configuration**: The consensus configuration is displayed. It corresponds to the weighted average of the assessors data.

**Similarity assessors/consensus**: The similarity indices between the assessors and the consensus are displayed, with the associated bar chart. Like the weights of CATATIS, these coefficients allow to detect atypical assessors. The advantage of these coefficients is that they are between 0 and 1, so they are easier to interpret than the weights.

**Weights**: The weights calculated by CATATIS are displayed, with the associated bar chart. The greater the weight, the more the assessor contributed to the consensus. Knowing that CATATIS gives more weight to the closest assessor from a global point of view, a much lower weight than the others will mean that the assessor is atypical.

#### Indices:

**Homogeneities**: The homogeneity of each class is displayed. It is a value between *1/m* (*m *being the number of assessors of the class) and 1, which increases with the homogeneity of the assessors. In a second step, the global homogeneity, which is a weighted average of the homogeneity of each class, is displayed.

**Global Error/Within-class Variance**: The error of the CLUSCATA criterion is displayed. It corresponds to the within-class variance.

### analyze your data with xlstat

Included in

Related features