Characteristics:
This test checks whether an observed distribution differs from an expected
distribution. Although the observations (i.e., the numbers on the first row) are
bi- or multi-nomial distributed, it is impractical to calculate the levels of
significance directly. Binomial distributions can be approximated by a normal
distributions if the expected number of observations is large enough. This is
used to calculate the "variance" of the observed distribution. Under H0
this "variance" has a Chi-square distribution.
H0:
The sample has the expected frequency distribution.
Assumptions:
None realy, except that the observations must be independent.
Scale:
Nominal
Procedure:
Calculate the test parameter X^2 = Sum over all columns ( Oi - Ei )^2 / Ei
which follows a Chi-square distribution by approximation with (I-1)
Degrees of Freedom (with I the number of columns).
Although the above procedure is the one generally found in text-books, it is not
the best one. It ommits the continuity correction that is needed because a
discrete (multinomial) distribution is approximated with a continuous (X^2) one.
A better test parameter is:
X^2 = Sum over all columns ( |Oi - Ei| - 0.5 )^2 / Ei
(|a-b| indicates the absolute value of the difference). This is the
approach actually used to calculate the X^2 value in this example.
Level of Significance:
Use a table to look up the level of significance associated with
X^2 and the Degrees of Freedom.
Approximation:
If the Degrees of Freedom > 30, the distribution of
z = {(X^2/DoF)^(1/3) - (1 - 2/(9*DoF))}/SQRT(2/(9*DoF))
can be approximated by a Standard Normal Distribution.
Remarks:
This approach is an approximation, even with the continuity correction. The
Chi-square distribution can only be used if all expected values, i.e., all Ei,
are larger than five. If this does not hold, combine the rarer categories
with larger ones.
Go HERE to compute this test.