Comparing the means of two groups

Plotting the data is a good way to get a feel for differences between groups, but statistics can provide us with two more pieces of information: a confidence interval for the difference between means and a measure of the probability that an effect is due to chance (statistical significance).

To access these tests, select Analysis -> Two Sample Test. The default test statistic is the one we want: the t-test.

Assumption Icons. You might notice the icons next to each test:

AssumptionIcons

These are a reminder of the assumptions of each test. Holding the mouse over the icon will cause text to appear showing you what assumption is indicated. For example, the icons next to the t-test indicate that it assumes "Large Sample" (icon of a large letter "N") and "No Outliers" (icon of a red dot far from the rest of a scatterplot.

Do not assume equal variance

If you click on the icon of a gear that is to the right of the t-test checkbox, a dialog opens allowing you to choose between two different forms of the t-test. The default setting is "Unequal Variance (Welch) (Recommended)", and that is the one that I recommend as well. Variance is a measure of dispersion, or how spread out the scores are. It is equal to the square of the standard deviation, another common measure of dispersion. Equal and unequal variance refer to the variances of the two groups you are comparing. The "Unequal Variance (Welch)" method will work whether your two groups have similar or dissimilar variance, whereas the other option (Student's t-test) is only valid when the two groups have approximately equal variance. If the variance of the two groups is different, then the p-value reported by Student's t-test will be artificially high or low. It is safer to make the assumption of unequal variances and use Welch's method, which reports a more unbiased estimate of p.

Output

After you enter the appropriate variables into "Factor" and "Outcomes" and click Run, you should get two pieces of output: the descriptive statistics for the two groups in your test and the results of the t-test. If you want less clutter on your screen, you can click the "Element View" tab in the output console to hide the red R command syntax.

Here are the results for the t-test (in Element View):

The standard deviations reported in the descriptive statistics confirm your earlier observation that the tip percentage is more spread out in the Chocolate condition (SD = 2.90) than in the No Chocolate condition (SD = 1.79).

Welch's t-test output adds a calculation of the difference between the means of the two groups - 3.12% in tips - and also the confidence interval for that difference: 2.11 to 4.12. You can be 95% confident that the difference between the population means for the Chocolate and No Chocolate conditions is somewhere between 2.11% and 4.12%. As before, the confidence interval allows you to know the precision of your estimate. Serving customers chocolate with their check will increase your tip percentage somewhere between 2 and 4%.

Test statistic, df, and p-value

The t statistic is 6.20, the df (degrees of freedom) is 74.87, and the p-value is less than .001. The p-value indicates that a t value more extreme than 6.2 occurs less than 1 out of a thousand times under the null distribution (assuming no difference between the two groups). This means that it is highly unlikely that the two groups are equal.

APA Style

To write up the results of this analysis, you could write:

Researchers hypothesized that giving customers chocolate with their bill would increase the tips that waiters received. Tip percentages for the two groups differed significantly according to Welch's t-test, t(74.87) = 6.2, p < .001. On average, customers given chocolate tipped 17.7 percent, while customers not given chocolate tipped 14.6 percent. The 95% confidence interval for the effect of chocolate on tip percentage is between 2.1 and 4.1 percent. These results support the researchers' hypothesis.

Note the formatting for reporting the results of the t-test:

  1. Write in complete sentences.
  2. State the researchers' hypothesis.
  3. Give the means of each group, using the units for the dependent variable (here, the units are percentages because the DV was recorded in terms of tip percentages).
  4. Give the degrees of freedom in parentheses after the letter t (which is italicized)
  5. Give the p-value.
  6. Report the confidence interval.
  7. Decimal places: Use a number of decimal places sufficient to distinguish two values but be aware that values with many decimal places are harder to remember.
  8. State whether the results supported, partially supported, or did not support the researchers' hypothesis.

Effect of Sample Size on Confidence Intervals

Earlier, we looked at how the confidence intervals in your plot got wider when you used a subset of 30 people instead of the original sample of 92 people. Let's look at what the effects of that change would be on t-test results:

Note how the confidence interval for the difference between means is now wider (1.4% to 5.4% instead of 2.1% to 4.1%). You have lost precision in your estimate because you have fewer subjects.

Negative t values and confidence intervals?

The sign of the t is determined by whether the first mean is larger than the second (in which case, t is positive) or whether the second mean is larger than the first (in which case, t is negative). The order of the two means is completely arbitrary, so you should feel free to ignore the sign of the t value. For example, let's say you are comparing the differences between students' test scores from two instructors named Smith and Templeton and you get these results:

Because Smith is the first mean (and smaller) and Templeton is the second mean (and larger), t is negative. Here is a perfectly acceptable interpretation:

Researchers hypothesized that exam scores would differ significantly by instructor. Mean exam scores differed significantly by instructor according to a Welch's t-test, t(22.27) = 5.83, p < .001. Students in Templeton's class (M = 28.42) scored significantly higher than students in Smith's class (M = 14.57). The 95% confidence interval of the difference is 8.92 to 18.77 points. These results support the researchers' hypothesis.