Plotting the data is a good
way to get a feel for differences between groups, but statistics can provide
us with two more pieces of information: a confidence interval for the
difference between means and a measure of the probability that an effect
is due to chance (statistical significance).
To access these tests, select
Analysis > Two Sample Test. The default test statistic is the
one we want: the ttest.
Assumption Icons. You
might notice the icons next to each test:
These are a reminder of the
assumptions of each test. Holding the mouse over the icon will cause
text to appear showing you what assumption is indicated. For example,
the icons next to the ttest indicate that it assumes "Large
Sample" (icon of a large letter "N") and "No
Outliers" (icon of a red dot far from the rest of a scatterplot.
Do not assume equal variance
If you click on the icon of a gear that is to the right of the ttest
checkbox, a dialog opens allowing you to choose between two different
forms of the ttest. The default setting is "Unequal Variance
(Welch) (Recommended)", and that is the one that I recommend as well. Variance is a measure of dispersion, or how spread out the scores are. It is equal to the square of the standard deviation, another common measure of dispersion. Equal and unequal variance refer to the variances of the
two groups you are comparing. The "Unequal Variance (Welch)" method will work whether your two groups have similar or dissimilar variance, whereas the other option (Student's ttest)
is only valid when the two groups have approximately equal variance. If the variance of the two groups is different, then
the pvalue reported by Student's ttest will be artificially
high or low. It is safer to make the assumption of unequal variances and
use Welch's method, which reports a more unbiased estimate of p.
Output
After you enter the appropriate variables into "Factor" and "Outcomes" and click Run, you should get two pieces of output: the descriptive statistics for the
two groups in your test and the results of the ttest. If you want
less clutter on your screen, you can click the "Element View"
tab in the output console to hide the red R command syntax.
Here are the results for the ttest (in Element View):
The standard deviations reported in the descriptive statistics
confirm your earlier observation that the tip percentage is more spread
out in the Chocolate condition (SD = 2.90) than in the No Chocolate condition
(SD = 1.79).
Welch's ttest output adds a calculation of the
difference between the means of the two groups  3.12% in tips  and also the confidence
interval for that difference: 2.11 to 4.12. You can be 95% confident that
the difference between the population means for the Chocolate and No Chocolate
conditions is somewhere between 2.11% and 4.12%. As before, the confidence
interval allows you to know the precision of your estimate. Serving
customers chocolate with their check will increase your tip percentage
somewhere between 2 and 4%.
Test statistic, df, and pvalue
The t statistic is 6.20, the df (degrees of freedom) is 74.87,
and the pvalue is less than .001. The pvalue indicates
that a t value more extreme than 6.2 occurs less than 1 out of
a thousand times under the null distribution (assuming no difference between the two groups). This means that it is highly unlikely that the two groups
are equal.
APA Style
To write up the results of this analysis, you could write:
Researchers hypothesized that giving customers chocolate with their bill would increase the tips that waiters received. Tip
percentages for the two groups differed significantly according to
Welch's ttest, t(74.87) = 6.2, p < .001.
On average, customers given chocolate tipped 17.7 percent, while customers
not given chocolate tipped 14.6 percent. The 95% confidence interval for
the effect of chocolate on tip percentage is between 2.1 and 4.1 percent. These results support the researchers' hypothesis. 
Note the formatting
for reporting the results of the ttest:
 Write in complete sentences.
 State the researchers' hypothesis.
 Give the means of each group, using the units for the dependent variable (here, the units are percentages
because the DV was recorded in terms of tip percentages).
 Give the degrees of freedom
in parentheses after the letter t (which is italicized)
 Give the
pvalue.
 Report the confidence interval.
 Decimal places: Use a number
of decimal places sufficient to distinguish two values but be aware
that values with many decimal places are harder to remember.
 State whether the results supported, partially supported, or did not support the researchers' hypothesis.
Effect of Sample Size on Confidence Intervals
Earlier, we looked at how the confidence intervals in your plot got wider when you used a subset of 30 people instead of the original sample of 92 people. Let's look at what the effects of that change would be on ttest results:
Note how the confidence interval for the difference between
means is now wider (1.4% to 5.4% instead of 2.1% to 4.1%). You have lost precision
in your estimate because you have fewer subjects.
Negative t values and confidence intervals?
The sign of the t is determined by whether the first mean is larger
than the second (in which case, t is positive) or whether the second
mean is larger than the first (in which case, t is negative). The
order of the two means is completely arbitrary, so you should feel free
to ignore the sign of the t value. For example, let's say you
are comparing the differences between students' test scores from two instructors
named Smith and Templeton and you get these results:
Because Smith is the first mean (and smaller) and Templeton is the second
mean (and larger), t is negative. Here is a perfectly acceptable
interpretation:
Researchers hypothesized that exam scores would differ significantly by instructor. Mean exam scores differed significantly by instructor according
to a Welch's ttest, t(22.27) = 5.83, p <
.001. Students in Templeton's class (M = 28.42) scored significantly
higher than students in Smith's class (M = 14.57). The 95%
confidence interval of the difference is 8.92 to 18.77 points. These results support the researchers' hypothesis. 
