Comparing the means of two groups
Plotting the data is a good way to get a feel for differences between groups, but statistics can provide us with two more pieces of information: a confidence interval for the difference between means and a measure of the probability that an effect is due to chance (statistical significance).
To access these tests, select Analysis -> Two Sample Test. The default test statistic is the one we want: the t-test.
Do not assume equal variance
If you click on the icon of a gear that is to the right of the t-test checkbox, a dialog opens allowing you to choose between two different forms of the t-test. The default setting is "Unequal Variance (Welch) (Recommended)", and that is the one that I recommend as well. Variance is a measure of dispersion, or how spread out the scores are. It is equal to the square of the standard deviation, another common measure of dispersion. Equal and unequal variance refer to the variances of the two groups you are comparing. The "Unequal Variance (Welch)" method will work whether your two groups have similar or dissimilar variance, whereas the other option (Student's t-test) is only valid when the two groups have approximately equal variance. If the variance of the two groups is different, then the p-value reported by Student's t-test will be artificially high or low. It is safer to make the assumption of unequal variances and use Welch's method, which reports a more unbiased estimate of p.
After you enter the appropriate variables into "Factor" and "Outcomes" and click Run, you should get two pieces of output: the descriptive statistics for the two groups in your test and the results of the t-test. If you want less clutter on your screen, you can click the "Element View" tab in the output console to hide the red R command syntax.
Here are the results for the t-test (in Element View):
The standard deviations reported in the descriptive statistics confirm your earlier observation that the tip percentage is more spread out in the Chocolate condition (SD = 2.90) than in the No Chocolate condition (SD = 1.79).
Welch's t-test output adds a calculation of the difference between the means of the two groups - 3.12% in tips - and also the confidence interval for that difference: 2.11 to 4.12. You can be 95% confident that the difference between the population means for the Chocolate and No Chocolate conditions is somewhere between 2.11% and 4.12%. As before, the confidence interval allows you to know the precision of your estimate. Serving customers chocolate with their check will increase your tip percentage somewhere between 2 and 4%.
Test statistic, df, and p-value
The t statistic is 6.20, the df (degrees of freedom) is 74.87, and the p-value is less than .001. The p-value indicates that a t value more extreme than 6.2 occurs less than 1 out of a thousand times under the null distribution (assuming no difference between the two groups). This means that it is highly unlikely that the two groups are equal.
To write up the results of this analysis, you could write:
Note the formatting for reporting the results of the t-test:
Effect of Sample Size on Confidence Intervals
Earlier, we looked at how the confidence intervals in your plot got wider when you used a subset of 30 people instead of the original sample of 92 people. Let's look at what the effects of that change would be on t-test results:
Note how the confidence interval for the difference between means is now wider (1.4% to 5.4% instead of 2.1% to 4.1%). You have lost precision in your estimate because you have fewer subjects.
Negative t values and confidence intervals?
The sign of the t is determined by whether the first mean is larger than the second (in which case, t is positive) or whether the second mean is larger than the first (in which case, t is negative). The order of the two means is completely arbitrary, so you should feel free to ignore the sign of the t value. For example, let's say you are comparing the differences between students' test scores from two instructors named Smith and Templeton and you get these results:
Because Smith is the first mean (and smaller) and Templeton is the second mean (and larger), t is negative. Here is a perfectly acceptable interpretation: