Brent Flowers
It is my goal to gain meaningful results to these questions so that I may perhaps discover activities advantageous and detrimental to my education and attempt to modify my actions accordingly.
II. Preliminary Information
I derived data pertinent to my study from the results of a larger study completed by my statistics class. A survey containing a multitude of personal questions was distributed to several classes, thus ensuring a higher rate of completion than other means, such as random distribution to campus mailboxes. 213 surveys were returned, thus ensuring that normal population distribution and making our survey results closer to the (unknown) population mean. Due to our random sample of students, several freshmen completed the survey; unfortunately, due to our need for each participant’s GPA, surveys completed by freshmen were expunged from the results (45 surveys). Two additional surveys were also removed due to a missing response in the “Hours of TV” and “Hours of Study” blank. However, the summary statistics listed below include all 213 responses due to the value of the other data provided; 168 results were used in the simple regression of GPA and gender as the two additional expunged data had no bearing on this result.
III. Summary Statistics
Below I have reproduced the Excel Summary Statistics, which speak for themselves. Of note is the mean GPA of 2.9698, the mean gender of .54, indicating a slight preponderance of females in our sample, the mean job of .5498, indicating that slightly more students are employed than not, mean varsity sport and dating of less than .50, indicating that the majority of our sample does not participate in a varsity sport and is not currently involved in a relationship, mean weekly hrs. of television of 9.1986, mean weekly hrs. of study of 16.5769, and mean hrs. of weekly Internet use of 2.7972
|
G.P.A |
|
|
|
|
|
Mean |
2.969764706 |
|
Standard Error |
0.034093701 |
|
Median |
2.975 |
|
Mode |
2.7 |
|
Standard Deviation |
0.444527477 |
|
Sample Variance |
0.197604678 |
|
Range |
2.33 |
|
Minimum |
1.58 |
|
Maximum |
3.91 |
|
Count |
170 |
|
|
|
|
Gender |
|
Job |
|
|
|
|
|
|
|
Mean |
0.5471698 |
Mean |
0.549763 |
|
Standard Error |
0.0342679 |
Standard Error |
0.034332 |
|
Median |
1 |
Median |
1 |
|
Mode |
1 |
Mode |
1 |
|
Standard Deviation |
0.4989481 |
Standard Deviation |
0.498700 |
|
Sample Variance |
0.2489493 |
Sample Variance |
0.2487023 |
|
Range |
1 |
Range |
1 |
|
Minimum |
0 |
Minimum |
0 |
|
Maximum |
1 |
Maximum |
1 |
|
Count |
212 |
Count |
211 |
|
|
|
|
|
|
Sport |
|
Dating |
|
|
|
|
|
|
|
Mean |
0.35714285 |
Mean |
0.4952830 |
|
Standard Error |
0.03314401 |
Standard Error |
0.0344198 |
|
Median |
0 |
Median |
0 |
|
Mode |
0 |
Mode |
0 |
|
Standard Deviation |
0.48030236 |
Standard Deviation |
0.5011611 |
|
Sample Variance |
0.23069036 |
Sample Variance |
0.2511624 |
|
Range |
1 |
Range |
1 |
|
Minimum |
0 |
Minimum |
0 |
|
Maximum |
1 |
Maximum |
1 |
|
Count |
210 |
Count |
212 |
|
|
|
|
|
|
Hrs. of TV |
|
Hrs. of study |
|
Hrs. Internet |
|||
|
|
|
|
|
|
|
||
|
Mean |
9.1985645 |
Mean |
16.576886 |
Mean |
2.7971698 |
||
|
Standard Error |
0.6193108 |
Standard Error |
0.5595803 |
Standard Error |
0.2825435 |
||
|
Median |
6 |
Median |
15 |
Median |
1.5 |
||
|
Mode |
10 |
Mode |
20 |
Mode |
1 |
||
|
Standard Deviation |
8.9532733 |
Standard Deviation |
8.14761231 |
Standard Deviation |
4.113895 |
|
|
|
Sample Variance |
80.161103 |
Sample Variance |
66.383586 |
Sample Variance |
16.924138 |
||
|
Range |
70 |
Range |
44.7 |
Range |
30 |
||
|
Minimum |
0 |
Minimum |
0.3 |
Minimum |
0 |
||
|
Maximum |
70 |
Maximum |
45 |
Maximum |
30 |
||
|
Count |
209 |
Count |
212 |
Count |
212 |
||
|
|
|
|
|
|
|
||
IV. Gender’s Influence on Grades
After reviewing the summary statistics, we are able to finally proceed to our first point of interest, namely the affect of gender on GPA. Our survey asked respondents to indicate their gender with (0) if male and (1) if female. Before proceeding, I formulate the hypothesis that being female has a positive correlation to one's GPA. In order to gain access to the data revealing this correlation, I first had to create a simple linear regression model. The first step in creating this model is the assigning of one independent variable (x) and one dependent variable (y). Gender is the obvious independent variable, as one’s sex is determined prior to birth; thus GPA is thought to be dependent on gender. We shall now create an equation to represent our model:
y(GPA)=b0 + b1(Gender)
However, since we do not have the gender and corresponding GPA data for the entire population of college students, we are forced to estimate bO and b1 using a sample, which is what our survey provides. Thus the equation:
y-hat1=bo + b1xi
Using the least squares method via Excel, our resulting equation is:
y-hat=2.895 + (.1465)xi
2.895 is the point at which the line (were these results to be graphed) intersects the y access and .1465 represents the change in y when x is changed by one unit (slope). In more meaningful terminology, were the survey respondent to be a male (b1=0), mean GPA would be 2.895. Were the respondent to be a female, mean GPA would be 3.0415 (2.895 + (.1465)1). Therefore, my earlier hypothesis that being a female has a beneficial influence on one’s GPA seems to be correct. However, 3.0415 cannot be hailed as the definite female mean GPA, as bo tends to absorb many of the effects on the dependent variable.
Excel also furnishes with more data, which helps one better understand the value of the model created. Labeled “R Square” (the coefficient of determination) under the “Regression Statistics” shown below, the corresponding number shows the percentage influence of gender on the total variation of GPA. Thus our model has a coefficient of determination of .0270, meaning that 2.70% of the total variation in GPA can be explained by using our regression equation between gender and GPA. While this percentage is quite small, it is nonetheless interesting that a factor completely out of one’s control, gender, seems to influence 2.70% of one’s total GPA variation.
|
SUMMARY OUTPUT |
|
|
|
|
|
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
Multiple R |
0.164436 |
|
|
|
|
|
R Square |
0.027039 |
|
|
|
|
|
Adjusted R Square |
0.021178 |
|
|
|
|
|
Standard Error |
0.442258 |
|
|
|
|
|
Observations |
168 |
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
0.902320 |
0.9023203 |
4.6132735 |
0.03317587 |
|
Residual |
166 |
32.46830 |
0.1955922 |
|
|
|
Total |
167 |
33.37062 |
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
|
|
Intercept |
2.895180 |
0.048544 |
59.640179 |
4.8567E-114 |
|
|
Gender |
0.146584 |
0.068246 |
2.1478532 |
0.0331758 |
|
In order to determine the overall significance of the model, we must perform a 2-tailed hypothesis test, the null being that b1=0, meaning that there exists no statistical relationship; the alternative hypothesis is that b1¹0, meaning that there indeed exists a statistical relationship between gender and GPA. We shall find a critical value tD/2 and compare it with the test statistic furnished by Excel; if the absolute value of the test statistic is greater than the critical value, the null is reject and it may be said that a statistical relationship exists between gender and GPA. Thus, our test statistic of 2.1479 is greater than critical value at the 95% degree of confidence and we may reject the null—we may then say that gender has a statistically significant affect on one’s GPA.
V. Effect of External Activities
We may now turn our attention to factors that are controllable by the student which may or may not have an affect on GPA. The following questions were asked by our survey: do you have an on/off campus job?; do you participate in varsity athletes?; are you currently involved in a relationship (dating) with someone?; how many hours of television do you watch per week?; how many hours per week do you spend studying?; and finally how many hours per week are spent using the internet? Prior to using Excel to analyze the survey results, I hypothesized that watching television, using the internet, dating, and having a campus job would all negatively impact GPA, while studying and playing a sport (due to the necessitated “budgeting” of time) will increase GPA. Once again I decided to use GPA as the dependent (y) variable, believing it to be dependent upon the several independent variables mentioned above. Once again, since the population data is unknown, we are forced to use sample data, which our survey provides. Since more than one independent variable is being studied in relation to the dependent variable, a multiple regression must be used. Our equation is thus:
y-hat=bo + b1(job)+b2(varsity sport) + b3(dating) + b4(hrs. tv) + b5(hrs. study) + b6(hrs. internet)
Excel reports the following result:
y=2.9466 +.06556x1+.01566x2+-.02641x3+-.00820x4+.00554x5+-.00494x6
Interpretation of the data reveals that the slope is 2.9466, meaning that were all the variables zero, this would be the GPA of the student. Yet again, we cannot place too much credence in this number, as it absorbs many of the effects on y. Each x represents the change in y when xi is changed by one unit holding the other variables constant. Employment has a positive correlation with GPA, our results stating that a job increases GPA by .06556. Participation in varsity athletes also has a positive correlation with GPA, our survey results showing that GPA increases .01566 for a varsity athlete. Dating has a negative correlation with GPA, having a boy/girl friend decreasing GPA by -.02641. Watching television also appears to negatively influence GPA, each additional hour of television watched per week decreases GPA by -.00820.
On the positive side, studying appears to have a positive correlation with GPA, each additional hour spent studying increases GPA by .0055. Finally, use of the internet appears to negatively impact GPA, each additional hour spent on the internet decreases GPA by -.0049.
|
SUMMARY OUTPUT |
|
|
|
|
|
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
Multiple R |
0.2417079 |
|
|
|
|
|
R Square |
0.0584227 |
|
|
|
|
|
Adjusted R Square |
0.0228914 |
|
|
|
|
|
Standard Error |
0.4413899 |
|
|
|
|
|
Observations |
166 |
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
6 |
1.9220632 |
0.32034 |
1.64426 |
0.1382864 |
|
Residual |
159 |
30.977184 |
0.19482 |
|
|
|
Total |
165 |
32.899248 |
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
|
|
Intercept |
2.9465569 |
0.1119822 |
26.3127 |
8.17E-60 |
|
|
Job |
0.0655604 |
0.0706795 |
0.92757 |
0.35503 |
|
|
Sport |
0.0156648 |
0.0751454 |
0.20846 |
0.83513 |
|
|
Dating |
-.0264096 |
0.0699791 |
-0.37739 |
0.70638 |
|
|
Hrs. TV |
-.0082007 |
0.0036951 |
-2.21935 |
0.02787 |
|
|
Hrs. Study |
0.0055390 |
0.0043745 |
1.26620 |
0.20729 |
|
|
Hrs. Net |
-.0049478 |
0.0077203 |
-0.64087 |
0.52252 |
|
Analysis of the significance of our model and its variables requires a 2-tailed hypothesis test, the null hypothesis being that bp =0, meaning that our overall model has no significance in the explanation of the variation of GPA, and the alternative hypothesis bp ¹0, meaning that our overall model is significant. Excel reveals that the model’s test F is 1.644, and when this is compared to the critical F values at 90%, 95%, and 99% degrees of significance, it is apparent that the test F is greater than none of these critical Fs, thus I cannot reject the null and my overall model is not significant. This may explain why my adjusted R2 , which is used in multiple regression instead of R2 in order to compensate for the number of independent variables in the model, reveals that my model accounts for a mere 2.28% of the variation in GPA.
Despite the finding that the overall model is insignificant, we can still test for the significance of the individual coefficients of the independent variables. For each of the six variables, a hypothesis test is completed, the null hypothesis equating each variable with zero, and the alternative hypothesis stating that each variable is not equal to zero. A rejection of the null indicates that the coefficient is significant. Each variables t Stat will be compared with t(/2 if the t Stat is greater than t(/2 the null may be rejected and the variable is significant. At neither the 90%, 95%, or 99% degree of confidence are the t Stats for employment, varsity sport, dating, hours studying, or hours on the internet, greater than t(/2 at these levels, thus these variables are not significant. However, the t Stat for hours watching television is greater than t(/2 at the 90% and 95% degree of confidence and thus this variable is significant.
VI. Deductive Inferences
The results of my analysis of the surveys produced somewhat different results than expected. Some initial hypotheses, such as the advantage of females over males in terms of GPA due exclusively to gender and the fact that television viewing harms GPA, were confirmed by the survey. Of interest is the fact that it appears as though 2.11% of the variance in GPA is influenced by gender, which is a non-controllable factor for the student. Yet, this also leads one to conclude that 97% of the GPA is dependent upon choices made by the student. Unfortunately, my attempt to ascertain what some of these variables include failed, due to the fact that my model is not significant. According to my model, the only variable that appears to influence (in a negative fashion) GPA is the number of hours of television watched per week. Thus, my only advice to students, teachers, and administrators, is to watch fewer hours of television if one wishes to increase GPA. Factors not covered by the survey, such as intelligence quotient, memory, and classroom attendance may be greater factors in GPA than the variables used in my model; thus, more surveying and analysis must be completed in order to attempt to gain a better notion of the influences on the variance of GPA.