Correlation is a statistical procedure designed to measure the strength and direction of the linear relation between two variables. The most common test statistic for correlation is the Pearson product-moment correlation coefficient, r.
Data for Example: Gender Questionnaire
The questionnaire included five measures of gender role attitudes:
r ranges in value from -1 to +1. A positive r indicates that high values on one variable tend to be found with high values on another variable. For example, the scatterplot below shows a correlation of r = +0.5 between the Attitudes Toward Women Scale (AWS), a measure of conservative gender roles, and a measure of the belief that women are more "Morally Virtuous" (MVIRT) than men are. You can get this scatterplot by selecting Analysis -> Correlation, putting MVIRT into the Variables box, AWS into the With box, pressing Plots and selecting Scatterplots, and clicking OK and then Run.
The dots in the plot above are slightly transparent so that overlapping points show up as darker. The solid blue line in the graph above is the line of best fit, a line that minimizes the vertical distances between the data points and the line itself. The best fit line is a useful way of representing the linear trend in a scatterplot. It helps capture the pattern indicated by r = +0.5: higher scores on AWS are found with higher scores on MVIRT. The gray shading around the blue line represents the 95% confidence interval around the line of best fit. You can have 95% confidence that the line of best fit for the population (the true best fit line) is within the shaded area. Note that the shaded area is not reflective of where 95% of the data points are - it corresponds to where the line of best fit would be drawn, not to where data are likely to appear.
You should also get this output:
That output includes a number of pieces of information:
There was a significant positive correlation between AWS and moral virtue, r(201) = +.5, p < .001. This correlation indicates that people with traditional gender role attitudes tend to believe that women are more morally virtuous than men are.
Remember to always add a sentence providing an interpretation after you
report statistical output.
r-squared. One way to express the strength of a correlation is to square
the r-value. r2 is the percentage of the variance or "information"
in one variable that can be "explained" or "predicted"
from the other variable, assuming a linear relation between them. If the
correlation between AWS and MVIRT is r = 0.5, then r2 is 0.25, which means
that 25% of the variance in people's beliefs about gender differences
in moral virtue can be explained by their belief in conservative gender
The correlation between aspirin and relief in the example above is exactly r = 0. Imagine that someone asked what the relation was between aspirin and relief. If you relied only on the correlation coefficient, you might be tempted to answer, "no relation at all." But if you look at the scatterplot, you should reach a different conclusion. The lesson here is that you must always plot your data. If the data are curvilinear (not linear but curved), you should not use correlation - it will not accurately capture the pattern in the data.
So, what do you do if you detect a curvilinear relation? You can report the r-value but make sure you also state that the scatterplot indicated a curvilinear relation and attempt to describe it. In the example above, you could say that there was a generally positive correlation between number of aspirin and relief when number of aspirin increased from 0 to 5, but between 5 and 10 aspirin, the relation became negative.
What if the correlation is non-significant and there is no discernible pattern in the scatterplot? In that case, it is better to report the correlation value, identify it as non-significant, and not provide any interpretation: "The correlation between shoe size and IQ was not significant, r(28) = .13, p = .34."
Although it is beyond the scope of this course, there are ways to test
for particular non-linear relations using Analysis -> Linear Model
and adding polynomial terms. For example, the pattern given above is the
result of a fairly common non-linear function called a quadratic, or second-order
polynomial. For now, you can just describe the non-linear pattern in words.