One-way ANOVA

Now go back and try Analyze -> Compare Means -> One-Way ANOVA. A potential factor variable should now be available for you. Put that into the Factor window and select your DV. Then press OK to run a One-way ANOVA.

You should get results that look like this:

These results show you the variance (Mean Square = "mean squared deviations from the mean" = variance) between groups and the variance within groups. Use a calculator and divide the MS(Between) by the MS(Within). Compare that value to the F statistic. Should be very close, because that's what the F is - a ratio of the variance between groups to the variance within groups.

APA Style for One-Way ANOVA Results

To report the results of a one-way ANOVA in APA style, you need to report the F, two degrees of freedom, and the p-value. The two degrees of freedom you need to report are the between-groups df and the within-groups df. You could write:

 A one-way ANOVA was used to test for preference differences among three sizes of a candy bar. Preferences for candy bar differed significantly across the three sizes, F (2, 27) = 5.77, p = .008.

Note the between-groups df comes first, and the within-groups df comes second.

Post Hoc Tests

A major limitation of the results from a one-way ANOVA is that you don't know how the means differ, you just know that the means are not equal to each other. To solve this little mystery, you can use post-hoc tests. Post-hoc means "after this" because this is a test you conduct after you already know that there is a difference among the means you are comparing.

Select Analyze -> Compare Means -> One-way ANOVA again, but this time press the "Post Hoc" button. The following screen appears:

As you can see from this screen, there are several post-hoc tests available. The one most appropriate to most of the questions you'll be asking is the "Tukey" test, also known as the Tukey "honestly significant difference (HSD)" comparison. Note: this is "Tukey," named after John Tukey, not "Turkey" as in Thanksgiving.

Given a set of 3 means, the Tukey procedure will test all possible 2-way comparisons: 1&2, 1&3, and 2&3.

The multiple comparison (fishing expedition) problem

Ordinarily, there is a problem with conducting three comparisons in a row. The problem is that with each additional test, it becomes more likely that you will obtain one statistically significant result just by chance. Think of it as a slot machine. If you pull the slot machine arm 4 times, you are 4 times as likely to hit the jackpot given a completely random process. If you do 4 tests of statistical significance, you are 4 times as likely to obtain one p<.05 result when there is no real difference between your means. This is called the "fishing expedition problem" because it's like you're casting your line out again and again, just hoping to snag something that's significant. There are a number of methods to deal with this problem, and each one is a kind of post hoc test. Each method is optimized for a particular set of circumstances. What every method does is to make an adjustment to the obtained significance level (p-value) to make it harder for you to obtain a p<.05. This is like pulling the slot machine handle 4 times and having the slot machine say "I know you just tried 4 times, so I'm making the odds of winning harder." The Tukey method is optimized for the situation in which you would like to test all possible pairwise comparisons (comparing sets of two) among your means. Select the Tukey option and press Continue.

Now click on Options and check the box under Statistics marked "Descriptive". You will need it to report confidence intervals. Click Continue and then OK. You should obtain the following new table:

 Descriptives pref N Mean Std. Deviation Std. Error 95% Confidence Interval for Mean Minimum Maximum Lower Bound Upper Bound fun 10 5.4100 .58775 .18586 4.9896 5.8304 4.20 6.20 regular 10 4.4300 .87693 .27731 3.8027 5.0573 3.40 6.50 king 10 5.1100 .44335 .14020 4.7928 5.4272 4.50 5.70 Total 30 4.9833 .76207 .13913 4.6988 5.2679 3.40 6.50

This table provides you with the mean values of your three groups (5.41, 4.43, and 5.11) and also the confidence intervals for each mean. Below that table is the ANOVA table from before, which has not changed, and next is another new table:

 Multiple Comparisons pref Tukey HSD (I) size3 (J) size3 Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound fun regular .98000* .29563 .007 .2470 1.7130 king .30000 .29563 .574 -.4330 1.0330 regular fun -.98000* .29563 .007 -1.7130 -.2470 king -.68000 .29563 .073 -1.4130 .0530 king fun -.30000 .29563 .574 -1.0330 .4330 regular .68000 .29563 .073 -.0530 1.4130 *. The mean difference is significant at the 0.05 level.

This output identifies your DV ("pref" for preference rating of candy bar) and tells you the type of multiple comparison adjustment it is using ("Tukey HSD"). If you attached text labels to the values of your new IV, you'll get them like I did above.

There are several comparisons listed in the table above. In the first row, you can see the comparison between fun-sized bars and regular bars. The difference between the means of these two groups is .9800. Following this row across, we see that this difference was statistically significant (p = .007). In the table above, we see that the significant overall ANOVA we found earlier was due to a difference between just two groups: fun vs. regular. None of the other comparisons are significant.

This is somewhat consistent with our conclusions from the plot: the regular-sized bar was liked less than either of the other two bars, which did not differ from each other. What is slightly different is that we expected that the regular bar would be liked significantly less than both the fun-sized and the king-sized bar. In fact, the regular-sized bar was liked significantly less than the fun-sized bar (p = .007) but the difference was not quite significant for the king-sized bar (p = .073).

There is another table of output as well:

This table is a handy summary of the major differences among the means. It organizes the means of the three groups into "homogeneous subsets" - subsets of means that do not differ from each other at p<.05 go together, and subsets that do differ go into separate columns. Groups that don't show up in the same column are significantly different from each other at p < .05 according to the Tukey multiple comparison procedure. Notice how the "regular" group and the "fun" group show up in separate columns. This indicates that those groups are significantly different. The king-size group shows up in each column, indicating that it is not significantly different from either of the other two groups.

To report these results, you could write:

 A one-way ANOVA was used to test for preference differences among three sizes of a candy bar. Preferences for candy bar differed significantly across the three sizes, F (2, 27) = 5.77, p = .008. Tukey post-hoc comparisons of the three groups indicate that the fun-size group (M = 5.41, 95% CI [4.99, 5.83]) gave significantly higher preference ratings than the regular-size group (M = 4.43, 95% CI [3.80, 5.06]), p = .007. Comparisons between the king-size group (M = 5.11, 95% CI [4.79, 5.43]) and the other two groups were not statistically significant at p < .05.

Note how the confidence intervals are presented: in brackets, after each mean, with the degree of confidence stated explicitly ("95% CI").