Statistics: Lab #2

Chapter 3: Descriptive Statistics with two variables and Chapter 4: Probabilities

You will have two files for this lab, they are separated below as parts A and B.

This lab will use Microsoft Excel to analyze the relationships between two variables as well as calculating marginal, joint and conditional probabilities.

 

Part A: Covariance and Correlation Coefficients

To retrieve your Data.

We will be using the data that accompanies the “Heavenly chocolates” Case Problem 1 on page 161 of the textbook. Follow the instructions in this lab, but you should read the background of this case in the text.

1. Open an Excel file that you can now save and begin working on.

2. Once you have opened the data, type your name (and any other people in your group) in cell A1 of the first sheet and the date in cell A2.

3. Save your Excel file as "Lab1 Chocolate, yourname". If you have more than one person in the group, save it as "Lab 1 Chocolate, allnames". Save early and often!!!

Initial Trends and Observations

Creating a Scatter-Plot

A. Before we begin, do you predict a positive or negative relationship between all three pairs of variables (Pages Viewed (x) and Time (y), Time (x) and Amount Spent (y), and Pages Viewed (x) and Amount Spent (y))? Why? (6 points)

B. Create three different scatter plots, one for each of the relationships listed in part A. Note: the variable in part A designated with (x) should be on the x-axis and the variable with the (y) should be on the y-axis. Give each scatter-plot and graph axes appropriate titles. Add a linear trend line to each of the scatter plots and locate each graph as a “New sheet”.

Each scatter plot is worth up to 2 points.

Save your file.

C. Do your scatter-plots support your initial response to question number 1? Why or why not? (2 points)

Obtaining a Covariance and a Correlation Coefficient.

The Covariance

We wish to compute a covariance for each of the three pairs of variables.

1. Select the Tools pull-down menu.

2. Select Data Analysis option. IF YOU DO NOT HAVE THIS OPTION, FOLLOW THE INSTRUCTIONS GIVEN IN LAB #1.

3. The input range is all of the columns of data. Do not include the column of years.  Select your input range from the label in the first column to the end of the last column. Make sure the option of Columns is selected.

    1. You should have included the labels on your columns, so make sure the option of Labels in first row is selected.
    2. Select Output range to tell Excel where to print your results. Somewhere near the top of this worksheet would be a good place.
    3. Finally, click OK to perform the analysis. You may need to adjust the column widths to see the labels better.

4. Question 3: Interpret the covariance between each of the three pairs of variables from part A above. (2 points)

Save your file.

The Correlation Coefficient

Follow the same instructions above given for the Covariance except select the Correlation option from within the Data Analysis window. Select the entire array of variables, check the box for Labels and place your output in a logical location.

 Question 4: What does your correlation coefficient table tell you about the relationship between the three pairs of variables from part A above? Does anything surprise you? (2 points)

Accurately calculating the covariance and correlation coefficients is worth up to 2 points.

 

Spending by Day of the Week

Use instructions in Lab #1 to create a pivot table that shows the total number of transactions that occur on each day of the week. The table should have the days of the week as the row labels and should have the count of amount spent in the ∑ Values part of the table.

Now convert this table to a column bar chart. Again, use instructions from Lab #1. Give the chart an appropriate title and labels for each axis and data labels on each column.  (2 points for table and chart)

Summarize what you have learned about the relationships between the three quantitative variables and what you see with regards to the weekly shopping patterns of these customers. (3 points)

 

Printing. You do NOT need to print your Excel file; you must email it to me as an attached file.

 

Part B: Probabilities

The scoop: Today we will be working with the Winter 2015 Student Database to draw statistical inferences about the population of undergraduates at Hanover College and to compute probabilities.

Accessing the Student Database

Your first task: create a file of your own to work with. You will not be using all of this data so you will need to copy and paste the appropriate columns into another blank workbook.

NOTE: It’s always good to look for blank responses. Delete any row (the entire row!) that has a blank response. They are not very useful from this point forward. 

Save your file.

Probabilities. If you compute the mean (use the =average formula) of your remaining observations you should be able to answer the two questions below simply by looking at the mean. Include these with the rest of your typed responses.

Note: any question that is phrased as Yes or No received a 1 for Yes, and 0 for No. In the Female column, female students were given a 1, men a 0.

Probability that a person drinks zero days in a typical week.

Joint Probabilities. Find the answer to the question: What is the probability that a randomly selected student is both a non-drinker and has a GPA of more than 3.2 hours (the median value)?

Conditional Probability #1. Find the answer to the question: Given that the student is a non-drinker, what is the probability that the student has a GPA above the median? Use the multiplication rule and previously calculated probabilities to answer this question. (3 points)

Conditional Probability #2. Find the answer to the question: Given that the student is a male, what is the probability that the student is a drinker?

IMPORTANT: TO AVOID WASTING PAPER BY PRINTING OFF THE ENTIRE DATABASE, I DON’T WANT YOU TO PRINT THIS SPREADSHEET. I CAN CHECK YOUR FORMULAS WHEN YOU SEND IT TO ME.

Questions:

1. Are the events ND and HGPA independent events? Are they mutually exclusive? Use probabilities to answer each question and explain your findings. Show your work. (3 points)

2. Are gender and drinking independent events? Are they mutually exclusive? Use probabilities to answer each question and explain your findings. Show your work. (3 points)

Summary

Now suppose you have the job of writing a letter to the Director of Counseling. Advise the college counseling staff about the relationships between gender, drinking, drug use and a student’s GPA. Use all of the information that you have compiled in this portion of the lab to address their concerns. Compute additional probabilities or use appropriate descriptive statistics to make your letter more informative.  More thorough and informative letters will earn more points. (7 points)

OK, you have carefully completed both parts A and B to this lab. You have double-checked that you have your typed responses to questions. Bring the typed responses to class on the due date. Your emailed Excel files must also be sent by the beginning of class on the due date. When you email your file, make sure your name is in the filename. Late assignments or those sent incorrectly will be penalized. Check the syllabus or ask for help if you are uncertain of how to do any of this.