[Home]  [Syllabus]  [Statnotes]  [Links]  [Lab]  [Instructor]  [Home]

Correlation


Overview

Key Concepts and Terms


Assumptions

  1. Interval level data (for Pearsonian correlation).

  2. Linear relationships. It is assumed that the x-y scattergraph of points for the two variables being correlated can be better described by a straight line than by any curvilinear function. To the extent that a curvilinear function would be better, Pearson's r and other linear coefficients of correlation will understate the true correlation, sometimes to the point of being useless or misleading.

    Linearity can be checked visually by plotting the data. In SPSS, select Graphs, Scatter/Dots; select Simple Scatter; click Define; let the independent be the x-axis and the dependent be the y-axis; click OK. One may also view many scatterplots simultaneously by asking for a scatterplot matrix: in SPSS, select Graphs, Scatter/Dots, Matrix, Scatter; click Define; move any variables of interest to the Matrix Variable list; click OK.

  3. Homoscedasticity is assumed. That is, the error variance is assumed to be the same at any point along the linear relationship. Otherwise the correlation coefficient is a misleading average of points of higher and lower correlation,

  4. No outliers. Outlier cases can attenuate correlation coefficients. Scatterplots may be used to spot outliers visually (see above). A large difference between Pearsonian correlation and Spearman's rho may also indicate the presence of outliers.

  5. Minimal measurement error is assumed since low reliability attenuates the correlation coefficient. By definition, correlation measures the systematic covariance of two variables. Measurement error usually, with rare chance exceptions, reduces systematic covariance and lowers the correlation coefficient. This lowering is called attenuation. Restricted variance, discussed below, also leads to attenuation.

  6. Unrestricted variance If variance is truncated or restricted in one or both variables due, for instance, to poor sampling, this can also lead to attenuation of the correlation coefficient. This also happens with truncation of the range of variables as by dichotomization of continuous data, or by reducing a 7-point scale to a 3-point scale.

  7. Similar underlying distributions are assumed for purposes of assessing strength of correlation. That is, if two variables come from unlike distributions, their correlation may be well below +1 even when data pairs are matched as perfectly as they can be while still conforming to the underlying distributions. Thus, the larger the difference in the shape of the distribution of the two variables, the more the attenuation of the correlation coefficient and the more the researcher should consider alternatives such as rank correlation. This assumption may well be violated when correlating an interval variable with a dichotomy or even an ordinal variable.

  8. Common underlying normal distributions, for purposes of assessing significance of correlation. Also, for purposes of assessing strength of correlation, note that for non-normal distributions the range of the correlation coefficient may not be from -1 to +1 (see Shih and Huang, 1992). Evaluating correlation with proper bounds. Biometrics, Vol . 48: 1207-1213. ). The central limit theorem demonstrates, however, that for large samples, indices used in significance testing will be normally distributed even when the variables themselves are not normally distributed, and therefore significance testing may be employed. The researcher may wish to use Spearman or other types of nonparametric rank correlation when there are marked violations of this assumption, though this strategy has the danger of attenuation of correlation.

  9. Normally distributed error terms. Again, the central limit theorem applies.


Frequently Asked Questions


Bibliography



Copyright 1998, 2008 by G. David Garson.
Last update 1/24/08.