Correlation |

Correlation is a statistical procedure designed to measure the strength
and direction of the linear relation between two variables. The most common
test statistic for correlation is the Pearson product-moment correlation
coefficient,
data(ChivQues) The questionnaire included five measures of gender role attitudes: - Chivalry (chiv), a measure of the degree to which a person endorses the idea that men have more of an obligation to protect and provide for women than vice-versa.
- Moral Virtue (MVIRT), a measure of the degree to which a person believes women are more morally virtuous (have a better conscience, more morally "pure," etc.) than men.
- Sexual Virtue (SVIRT), a measure of the degree to which a person believes that women are more sexually virtuous (think about sex less often, don't think about others in sexual ways, etc.) than men.
- Attitudes toward Women Scale (AWS), a published measure of conservative or traditional gender role attitudes (e.g., women should not work outside the home). I gave this measure to only half of the sample.
- Female Agency (agency), a measure of the degree to which a person believes that women are as competent and as well-suited to positions of authority as men are.
## Positive Correlations
The dots in the plot above are slightly transparent so that overlapping points show up as darker. The solid blue line in the graph above is the “line of best fit”, a line that minimizes the vertical distances between the data points and the line itself. The “best fit line” is a useful way of representing the linear trend in a scatterplot. It helps capture the pattern indicated by r = +0.5: higher scores on AWS are found with higher scores on MVIRT. The gray shading around the blue line represents the 95% confidence interval around the line of best fit. You can have 95% confidence that the line of best fit for the population (the “true” best fit line) is within the shaded area. Note that the shaded area is not reflective of where 95% of the data points are - it corresponds to where the line of best fit would be drawn, not to where data are likely to appear. You should also get this output: MVIRT cor 0.500 95% CI [0.388, 0.597] AWS N 201 t (df) 8.143 (199) p-value* <0.001 Notes: That output includes a number of pieces of information: There was a significant positive correlation between AWS and moral virtue, r(201) = +.5, p < .001. This correlation indicates that people with traditional gender role attitudes tend to believe that women are more morally virtuous than men are. Remember to always add a sentence providing an interpretation after you
report statistical output. r-squared. One way to express the strength of a correlation is to square
the r-value. r2 is the percentage of the variance or "information"
in one variable that can be "explained" or "predicted"
from the other variable, assuming a linear relation between them. If the
correlation between AWS and MVIRT is r = 0.5, then r2 is 0.25, which means
that 25% of the variance in people's beliefs about gender differences
in moral virtue can be explained by their belief in conservative gender
roles. The correlation between aspirin and relief in the example above is exactly r = 0. Imagine that someone asked what the relation was between aspirin and relief. If you relied only on the correlation coefficient, you might be tempted to answer, "no relation at all." But if you look at the scatterplot, you should reach a different conclusion. The lesson here is that you must always plot your data. If the data are curvilinear (not linear but curved), you should not use correlation - it will not accurately capture the pattern in the data. So, what do you do if you detect a curvilinear relation? You can report the r-value but make sure you also state that the scatterplot indicated a curvilinear relation and attempt to describe it. In the example above, you could say that there was a generally positive correlation between number of aspirin and relief when number of aspirin increased from 0 to 5, but between 5 and 10 aspirin, the relation became negative. What if the correlation is non-significant and there is no discernible pattern in the scatterplot? In that case, it is better to report the correlation value, identify it as non-significant, and not provide any interpretation: "The correlation between shoe size and IQ was not significant, r(28) = .13, p = .34." Although it is beyond the scope of this course, there are ways to test
for particular non-linear relations using Analysis -> Linear Model
and adding polynomial terms. For example, the pattern given above is the
result of a fairly common non-linear function called a quadratic, or second-order
polynomial. For now, you can just describe the non-linear pattern in words. |