The Rank Correlation coefficient
{From the Institute of Phonetic Sciences (IFA):
http://www.fon.hum.uva.nl/}
Characteristics:
The Rank Correlation test is a distribution free test that determines whether
there is a monotonic relation between two variables ( x , y ). A
monotonic relation exists when any increase in one variable is invariably
associated with either an increase or a decrease in the other
variable. In equation form, for the pairs (X1, Y1) and (X2,
Y2):
If X2 > X1 then Y2 >= Y1 for a monotonic
increase
If X2 > X1 then Y2 <= Y1 for a monotonic
decrease
The monotonic relation is expressed using rank-order numbers instead of the
values. This also makes the Rank Correlation a test distribution free
test. Although the Rank Correlation coefficient can be interpreted as indicating
the "strength" of the monotonic association, quantifying this strength is so
complex that for all practical purposes this is a non-parametric test.
H0:
There is no monotonic relation between the variables.
Assumptions:
None realy
Scale:
Ordinal
Procedure:
Rank order all x and y values seperately. Determine the
differences between the ranks of both variables V = Rank(x) -
Rank(y). Sum the squares of the differences in rank order numbers (i.e.,
Sum( V**2 ) ).
The Spearman Rank Correlation Coefficient is:
Rs = 1 - 6 * Sum( V**2 ) / ( N * ( N**2 - 1 ))
Level of Significance:
Look up the values of Rs and N in a table. The level of
significance is determined by checking all permutations of ranks in the sample
and counting the fraction for which the Rs' is more extreme than the
Rs found. As the number of permutations grows proportional to N! (the
factorial of N), this is not very practical for large values of N.
For N > 10 this example uses only an approximation (i.e., only a random
subset of the permutations is actualy checked).
Approximation:
If N > 30, the distribution of Z = Rs * sqrt( N - 1 ) can
be approximated by a
Standard Normal Distribution.
Remarks:
This example uses the
Standard Normal approximation for N > 30. For N < 11 the exact
value is calculated. For all other values of 10 < N < 31, p is
calculated from a random subset of the possible permutations. This latter value
is not very exact.
As a statistical test to check whether a relation between two variables exists,
this test is better than the standard
correlation coefficient because the latter will only work when
there is a linear relation between the variables. In practical
situations, assuming a linear relation will very often be unrealistic.
This test is also usefull to check whether matched pairs are realy matched. If
they are, their rank correlation should be statistically significant.
You can compute this statistic by pointing HERE..