Residuals in Crosstabs

When a crosstab analysis is created in the 'ANALYSE' – 'Statistics' – 'Crosstabs' tab, the value of the Chi-square is displayed and the cells within the table are coloured, based on residuals.

Residuals make it extremely easy and efficient to analyse what is happening in the table. Unlike the Chi-square, which gives only a general diagnosis of the relationship in the table, residuals show exactly where the correlation is happening. In fact, a Chi-square may be statistically significant only because of a correlation in a single cell, but it does not tell us where that is.

Residual is a term from the analysis of nominal variables. A residual is simply the difference between the actual frequency in a given cell and the theoretical frequency that would exist if the variables of a two-dimensional table in that cell were uncorrelated (the null hypothesis). The theoretical frequency is calculated very simply as the product of the two margins divided by the total size of the table.

If the underlying residuals - which follow a Poisson distribution under the usual assumption - are standardised (subtract the expected value and divide by the standard deviation), we obtain standardised residuals, which are asymptotically normally distributed. They can therefore be subject to the usual interpretation from hypothesis testing and also to the usual critical values, e.g. 1.65 or 1.96 at 10% or 5% risk.

Adjusted residuals further correct for unequal margin dimensions and some researchers have shown that they are more appropriate than the usual standardised residuals, which is our recommendation, so we use adjusted residuals in our analysis (coloring of cells).

The 1KA application uses and colors the 1.0, 2.0 and 3.0 margins for the values of the adjusted residuals, which therefore roughly indicate the strength of the correlation in a given cell or the strength of the deviation from the null hypothesis assumption. Meaning of the values for the standardised residuals:

  • above 1.0 implies a certain increase and attention,
  • above 2.0 (a simplification of 1.96) implies a statistically significant difference (sign< 0.05), i.e. the residuals differ from zero with a relatively small risk.
  • above 3.0 already implies a strong deviation (sign<0.01), which means that the residuals are almost certainly different from zero and therefore something is "happening" in the cell.

Cells colored blue mean that there are fewer units in the cell than expected, and cells colored red mean that there are more units in the cell than expected.

For example, if there are 30 units in a cell and the expected value is 20, the basic residual is 10. If, for example, gender and agreement/opinion are considered, we therefore say that e.g. men are significantly more FOR than we would expect if gender had no effect. If we subtract the expected value from the residual 10 and divide by its square root (the square root of 20 is 4.5, since the Poisson distribution has an expected value equal to the variance), we get the standardised residual, which in this case is greater than 2, since we have (20-10)/4.5>2.0.

If we correct this slightly on the basis of the formulae in the appendices below, we obtain an adjusted residual which - barring really extreme asymmetries in the margins (YES:NO, male:female) - has a quite similar value. In any case, we can conclude that there are statistically significant deviations in this cell, and on this basis we can also proceed to a substantive interpretation (e.g. reasons why men are more FOR).

The coloring of cells in 1KA is indicative, simplified and intended purely as a screening (exploratory) analysis. In the formal interpretation, either the exact standardised or - better still - the adjusted residual is provided and interpreted in the usual sense as the examples below indicate.

The exact residuals are obtained in 1KA by selecting the checkbox option to calculate them (next to the independent and dependent variable dropdowns). 

Of course, the whole table and its Chi-square can be interpreted. But - as mentioned before - the residuals are more precise than the full Chi-square because they focus on exactly each individual cell where outliers occur. Further insight is gained by analysing the difference in shares based on a T-test.

Of course, all of this together is only valid for nominal variables. If one of the variables is "well" ordinally ordered - and even more so if there is a definite interval or ratio scale - we would, of course, prefer to use a T-test or analysis of variance.

Some useful links:

Related content

1KA is free to use for basic users