Statistics: Lab
#2
Chapter 3:
Descriptive Statistics with two variables and Chapter 4: Probabilities
You will have two files for this lab, they are
separated below as parts A and B.
This lab will use Microsoft Excel to analyze the relationships between two variables as well as calculating marginal, joint and conditional probabilities.
Part A: Covariance and Correlation Coefficients
To retrieve your
Data.
We will be using the data that
accompanies the “Heavenly
chocolates” Case Problem 1 on page 161 of the textbook. Follow the
instructions in this lab, but you should read the background of this case in
the text.
1. Open an Excel file that you can now
save and begin working on.
2. Once you have opened the data,
type your name (and any other people in your group) in cell A1 of the first
sheet and the date in cell A2.
3. Save your Excel file as
"Lab1 Chocolate, yourname". If you have more than one person in the group, save it as "Lab
1 Chocolate, allnames". Save early and
often!!!
Initial Trends and
Observations
Creating a Scatter-Plot
A. Before we begin, do you predict a positive or negative relationship between all three pairs of variables (Pages Viewed (x) and Time (y), Time (x) and Amount Spent (y), and Pages Viewed (x) and Amount Spent (y))? Why? (6 points)
B. Create three different scatter plots, one for each of the relationships listed in part A. Note: the variable in part A designated with (x) should be on the x-axis and the variable with the (y) should be on the y-axis. Give each scatter-plot and graph axes appropriate titles. Add a linear trend line to each of the scatter plots and locate each graph as a “New sheet”.
Each scatter plot is worth up to 2 points.
Save your file.
C. Do your scatter-plots support your initial response to question number 1? Why or why not? (2 points)
Obtaining a Covariance and a Correlation Coefficient.
The Covariance
We wish to compute a covariance for
each of the three pairs of variables.
1. Select the Tools pull-down menu.
2. Select Data Analysis option. IF YOU DO NOT HAVE THIS OPTION, FOLLOW THE INSTRUCTIONS GIVEN IN LAB #1.
3. The input range is all of the columns of data. Do not include the column of years. Select your input range from the label in the first column to the end of the last column. Make sure the option of Columns is selected.
4. Question 3: Interpret the covariance between each of the three pairs of variables from part A above. (2 points)
Save your file.
The
Correlation Coefficient
Follow the same instructions above given for the Covariance except select the Correlation option from within the Data Analysis window. Select the entire array of variables, check the box for Labels and place your output in a logical location.
Question 4: What does your correlation coefficient table tell you about the relationship between the three pairs of variables from part A above? Does anything surprise you? (2 points)
Accurately calculating the covariance and correlation coefficients is worth up to 2 points.
Spending by Day of the Week
Use instructions in Lab #1 to create a pivot table that shows the total number of transactions that occur on each day of the week. The table should have the days of the week as the row labels and should have the count of amount spent in the ∑ Values part of the table.
Now convert this table to a column bar chart. Again, use instructions from Lab #1. Give the chart an appropriate title and labels for each axis and data labels on each column. (2 points for table and chart)
Summarize what you have learned about the relationships between the three quantitative variables and what you see with regards to the weekly shopping patterns of these customers. (3 points)
Printing. You do NOT need to print your Excel file; you must email it to me as an attached file.
Part B: Probabilities
The scoop: Today we will be working with the Winter 2015 Student Database to draw statistical inferences about the population of undergraduates at Hanover College and to compute probabilities.
Accessing the Student Database
Your first task: create a file of your own to work with. You will not be using all of this data so you will need to copy and paste the appropriate columns into another blank workbook.
NOTE: It’s always good to look for blank responses. Delete any row (the
entire row!) that has a blank response. They are not very useful from this
point forward.
Save your file.
Probabilities. If you compute the mean (use the =average formula) of your remaining observations you should be able to answer the two questions below simply by looking at the mean. Include these with the rest of your typed responses.
Note: any question that is phrased as Yes or No received a 1 for Yes, and
0 for No. In the Female column, female students were given a 1, men a 0.
Probability that a person drinks zero days in a typical
week.
Joint Probabilities. Find the answer to the question: What is the probability that a randomly selected student is both a non-drinker and has a GPA of more than 3.2 hours (the median value)?
Conditional Probability #1. Find the answer to the question: Given that the student is a non-drinker, what is the probability that the student has a GPA above the median? Use the multiplication rule and previously calculated probabilities to answer this question. (3 points)
Conditional Probability #2. Find the answer to the question: Given that the student is a male, what is the probability that the student is a drinker?
IMPORTANT: TO AVOID WASTING PAPER BY PRINTING OFF THE
ENTIRE DATABASE, I DON’T WANT YOU TO PRINT THIS SPREADSHEET. I CAN CHECK YOUR
FORMULAS WHEN YOU SEND IT TO ME.
Questions:
1. Are the events ND and HGPA independent events? Are they mutually exclusive? Use probabilities to answer each question and explain your findings. Show your work. (3 points)
2. Are gender and drinking independent events? Are they mutually exclusive? Use probabilities to answer each question and explain your findings. Show your work. (3 points)
Summary
Now suppose you have the job of writing a letter to the Director of Counseling. Advise the college counseling staff about the relationships between gender, drinking, drug use and a student’s GPA. Use all of the information that you have compiled in this portion of the lab to address their concerns. Compute additional probabilities or use appropriate descriptive statistics to make your letter more informative. More thorough and informative letters will earn more points. (7 points)
OK, you have carefully completed both parts A and B to this lab. You have
double-checked that you have your typed responses to questions. Bring the typed
responses to class on the due date. Your emailed Excel files must also be sent
by the beginning of class on the due date. When you email your file, make sure
your name is in the filename. Late assignments or those sent incorrectly will
be penalized. Check the syllabus or ask for help if you are uncertain of how to
do any of this.