**Chi-square test for equality of distributions
(Chi-square test of independence)
{From the Institute of Phonetic Sciences (IFA):
http://www.fon.hum.uva.nl/}**

*Characteristics:*

This is the most widely used test on nominal data. Although the observations
(i.e., the numbers) are bi- or multi-nomial distributed, it is impractical to
calculate the levels of significance directly. Binomial distributions can be
approximated by a normal distributions if the expected number of observations is
large enough. This is used to calculate the "variance" of the observed
distribution. Under *H0* this "variance" has a Chi-square distribution.

*H0:*

All samples have the same frequency distribution.

*Assumptions:*

None realy, except that the observations must be independent.

*Scale:*

Nominal

*Procedure:*

Calculate the expected number of observations, *Eij*, under *H0*: *
Eij = Ni * Oj / N*, in which *Oj* are the total number of observations
of categories *j* (j from 1 to J, i.e., the column totals) and *Ni*
the sizes of samples *i* (i from 1 to I, i.e., the row totals).

The test parameter is *X^2 = Sum over all cells ( Oij - Eij )^2 / Eij*
which follows a Chi-square distribution by approximation with *(J-1)*(I-1)*
Degrees of Freedom.

Although the above procedure is the one generally found in text-books, it is not
the best one. It ommits the continuity correction that is needed because a
discrete (multinomial) distribution is approximated with a continuous (X^2) one.
A *better* test parameter is:

*X^2 = Sum over all cells ( |Oij - Eij| - 0.5 )^2 / Eij*

(|a-b| indicates the *absolute value* of the difference). This is the
approach actually used to calculate the *X^2* value in this example.

*Level of Significance:*

Use a table to look up the level of significance associated with
*X^2* and the *Degrees of Freedom*.

*Approximation:*

If the *Degrees of Freedom* > 30, the distribution of

*z = {(X^2/DoF)^(1/3) - (1 - 2/(9*DoF))}/SQRT(2/(9*DoF))*

can be approximated by a Standard Normal Distribution.

*Remarks:*

This approach is an approximation, even with the continuity correction. The
Chi-square distribution can only be used if all expected values, i.e., all *
Eij*, are larger than **five**. If this does not hold, combine the rarer
categories with larger ones.

You can compute this test by clicking HERE.