Plotting the data is a good way to get a feel for differences between groups, but statistics can provide us with two more pieces of information: a confidence interval for the difference between means and a measure of the probability that an effect is due to chance (statistical significance).

To access these tests, select Analysis -> Two Sample Test. The default test statistic is the one we want: the t-test.

Assumption Icons

You might notice the icons below each test:


These are a reminder of the assumptions of each test. Holding the mouse over the icon will cause text to appear showing you what assumption is indicated. For example, the icons below the t-test indicate that it assumes "Large Sample" (icon of a large letter "N") and "No Outliers" (icon of a red dot far from the rest of a scatterplot). If your data includes outliers, you should consider using the Wilcoxon test (pictured on the right), which is not sensitive to outliers.

Do not assume equal variance

If you click on the icon of a gear that is to the right of the t-test checkbox, a dialog opens allowing you to choose between two different forms of the t-test. The default setting is "Unequal Variance (Welch) (Recommended)", and that is the one that I recommend as well. Variance is a measure of dispersion, or how spread out the scores are. Equal and unequal variance refer to the variances of the two groups you are comparing. The "Unequal Variance (Welch)" method will work whether your two groups have similar or dissimilar variance, whereas the other option (Student's t-test) is only valid when the two groups have approximately equal variance. If the variance of the two groups is different, then the p-value reported by Student's t-test will be artificially high or low. It is safer to make the assumption of unequal variances and use Welch's method, which reports a more unbiased estimate of p.


After you enter the appropriate variables into "Factor" (the independent variable in your study) and "Outcomes" (the dependent variable in your study) and click Run, you should get two pieces of output: the descriptive statistics for the two groups in your test and the results of the t-test. If you want less clutter on your screen, you can click the "Element View" tab in the output console to hide the red R command syntax.

Here are the results for the t-test (in Element View):

The standard deviations reported in the descriptive statistics confirm your earlier observation that the tip percentage is more spread out in the Chocolate condition (SD = 2.90) than in the No Chocolate condition (SD = 1.79).

Welch's t-test output adds a calculation of the difference between the means of the two groups - 3.12% in tips - and also the confidence interval for that difference: 2.11 to 4.12. You can be 95% confident that the difference between the population means for the Chocolate and No Chocolate conditions is somewhere between 2.11% and 4.12%. As before, the confidence interval allows you to know the precision of your estimate. Serving customers chocolate with their check will increase your tip percentage somewhere between 2 and 4%.

Test statistic, df, and p-value

The t statistic is 6.20, the df (degrees of freedom) is 74.87, and the p-value is less than .001. The p-value indicates that a t value more extreme than 6.2 occurs less than 1 out of a thousand times under the null distribution (assuming no difference between the two groups). This means that it is highly unlikely that the two groups are equal.

APA Style

To write up the results of this analysis, you could write:

Researchers hypothesized that giving customers chocolate with their bill would increase the tips that waiters received. Tip percentages for the two groups differed significantly according to Welch's t-test, t(74.87) = 6.2, p < .001. On average, customers given chocolate tipped 17.7 percent, while customers not given chocolate tipped 14.6 percent. The 95% confidence interval for the effect of chocolate on tip percentage is between 2.1 and 4.1 percent. These results support the researchers' hypothesis.

Common Error!

In the Strohmetz chocolate study, the dependent variable is in units of percentages because the researchers are studying tip percentages. However, most studies do not have dependent variables in units of percentages. The dependent variable could be in seconds, in which case you would not say that the confidence interval is between 2.1 and 4.1 percent, you would say it is between 2.1 and 4.1 seconds. Many students have copied the paragraph above in their stats assignments and not changed from percentages to the appropriate units for the dependent variable.

Note the formatting for reporting the results of the t-test:

  1. Write in complete sentences.
  2. State the researchers' hypothesis.
  3. Give the means of each group, using the units for the dependent variable (here, the units are percentages because the DV was recorded in terms of tip percentages).
  4. Give the degrees of freedom in parentheses after the letter t (which is italicized)
  5. Give the p-value.
  6. Report the confidence interval.
  7. Decimal places: Use a number of decimal places sufficient to distinguish two values but generally not more than 2 or 3.
  8. State whether the results supported, partially supported, or did not support the researchers' hypothesis.

Effect of Sample Size on Confidence Intervals

Earlier, we looked at how the confidence intervals in your plot got wider when you used a subset of 30 people instead of the original sample of 92 people. Let's look at what the t-test results would look like if we shrank the sample to 30:

Note how the confidence interval for the difference between means is now wider (1.4% to 5.4% instead of 2.1% to 4.1%). You have lost precision in your estimate because you have fewer subjects.

Negative t values and confidence intervals?

The sign of the t-value is determined by whether the first mean is larger than the second (in which case, t is positive) or whether the second mean is larger than the first (in which case, t is negative). For example, let's say you are comparing the differences between students' test scores from two instructors named Smith and Templeton and you get these results:

Because Smith is the first mean (and smaller) and Templeton is the second mean (and larger), t is negative. Here is a perfectly acceptable interpretation:

Researchers hypothesized that exam scores would differ significantly by instructor. Mean exam scores differed significantly by instructor according to a Welch's t-test, t(22.27) = -5.83, p < .001. Students in Templeton's class (M = 28.42) scored significantly higher than students in Smith's class (M = 14.57). The 95% confidence interval of the difference is -8.92 to -18.77 points. These results support the researchers' hypothesis.

95% Confidence Intervals Crossing Zero

There is an interesting relationship between confidence intervals and the p-value for t-tests. If a 95% confidence interval of the difference between means crosses zero, then the t-test of that comparison will not be significant at p < .05. So, if you find that the confidence interval of the difference between means is -3.2 to 7.5 points, you know that the p-value of the t-test will not be significant at p < .05 because -3.2 to 7.5 crosses zero. If a confidence interval of a difference crosses zero, it means that the difference could be zero, and that means there could be no difference at all.

There is a 5-part stats homework assignment on comparing means (and reviewing reliability) that accompanies this tutorial. To begin the assignment, you can log in to your moodle account for this course or the button below should take you directly to the assignment (after letting you log in). Don’t forget that there are FIVE parts to the homework assignment. The button below just takes you to the first one.

Begin Assignment