*Characteristics:*

This test checks whether an observed distribution differs from an expected
distribution. Although the observations (i.e., the numbers on the first row) are
bi- or multi-nomial distributed, it is impractical to calculate the levels of
significance directly. Binomial distributions can be approximated by a normal
distributions if the expected number of observations is large enough. This is
used to calculate the "variance" of the observed distribution. Under *H0*
this "variance" has a Chi-square distribution.

*H0:*

The sample has the expected frequency distribution.

*Assumptions:*

None realy, except that the observations must be independent.

*Scale:*

Nominal

*Procedure:*

Calculate the test parameter *X^2 = Sum over all columns ( Oi - Ei )^2 / Ei*
which follows a Chi-square distribution by approximation with *(I-1)*
Degrees of Freedom (with *I* the number of columns).

Although the above procedure is the one generally found in text-books, it is not
the best one. It ommits the continuity correction that is needed because a
discrete (multinomial) distribution is approximated with a continuous (X^2) one.
A *better* test parameter is:

*X^2 = Sum over all columns ( |Oi - Ei| - 0.5 )^2 / Ei*

(|a-b| indicates the *absolute value* of the difference). This is the
approach actually used to calculate the *X^2* value in this example.

*Level of Significance:*

Use a table to look up the level of significance associated with
*X^2* and the *Degrees of Freedom*.

*Approximation:*

If the *Degrees of Freedom* > 30, the distribution of

*z = {(X^2/DoF)^(1/3) - (1 - 2/(9*DoF))}/SQRT(2/(9*DoF))*

can be approximated by a Standard Normal Distribution.

*Remarks:*

This approach is an approximation, even with the continuity correction. The
Chi-square distribution can only be used if all expected values, i.e., all *Ei*,
are larger than **five**. If this does not hold, combine the rarer categories
with larger ones.

Go HERE to compute this test.