McNemar's Test
{Adapted from the Institute of Phonetic Sciences (IFA):
http://www.fon.hum.uva.nl/}
Characteristics: This non-parametric test uses matched-pairs of labels (A, B). It determines whether the proportion of A and B labels is equal for both members. It is a very good test when only nominal data are available, e.g., correct versus incorrect identification of stimuli. Essentially, McNemar's Test is a Sign-Test in disguise. All (A, A) and (B, B) pairs are ignored and waht is tested is whether (A, B) is as likely as (B, A) by labeling the one as + and the other as - and performing a Sign-Test on the number of + and - labels.
McNemar's Test is generally used when the data consist of paired observations of
labels. An example is an identification experiment in which each subject has to
identify two different "versions" of each stimulus. The labels are correct
and error. What is tested is whether a correct identification of
the first version and an error in the identification of the second
version is more or less likely than the reverse. These data
cannot be analyzed with a test on
binomial proportions because the two samples are not independent.
H0: AB pairs are as likely as BA pairs.
Assumptions: Only that the pairs are matched.
Scale: Nominal
Procedure: Ignore the pairs with identical labels, count the pairs AB (n+) and the pairs BA (n-).
Level of significance: n+ and n- are
binomial distributed with p = q = 1/2 and N = (n+) +
(n-).
If k is the smaller of (n+) and (n-) then:
p <= 2 * Sum (i=0 to k) {N!/(i!*(N-i)!)}/4
(with k! = k*(k-1)*(k-2)*...*1 is the factorial of k and 0! = 1)
Approximation: If (n+) + (n-) = N > 25, then Z = (| n+ - n- | - 1)/sqrt( N ) can be approximated with a Standard Normal distribution. In our example, we calculate the exact probabilities up to N = 100. For N > 30, the Student t test can be used.
Remarks: For McNemar's Test, the same remarks hold as for the Sign-Test. In many cases, it is the only test that can be applied without making many unlikely assumptions. This is especially so because, e.g., error rates in identification experiments tend to be small. As a result, there often are too few relevant observations to use parametric tests.
You can compute this test by clicking HERE.