Statistical Analysis of Factors Influencing Scholastic Performance

Brent Flowers

I: Prolegomenous Comments
Throughout the history of the collegiate education system, students, teachers, and administrators have consistency queried: “Exactly what elements affect a student’s performance at college (this being measured by a cumulative average of the grades he earns throughout his college career, referred to herein as a Grade Point Average (G.P.A.))?”  Eliminating factors particular to the institution, for instance teaching quality or cost of facilities, and in their place substituting elements that are universal influences on all students regardless of institutional residence, I seek to discover the external influences on G.P.A..  Is a student’s J.P. wholly dependent upon genetic factors, namely gender?  Are males or females naturally advantaged or disadvantaged by their sex?  What about more subjective factors: on-campus employment, participation in varsity athletics, dating, television viewing, hours spent studying per week, and hours spent using the Internet? 

            It is my goal to gain meaningful results to these questions so that I may perhaps discover activities advantageous and detrimental to my education and attempt to modify my actions accordingly.

 

II.  Preliminary Information

 

            I derived data pertinent to my study from the results of a larger study completed by my statistics class.  A survey containing a multitude of personal questions was distributed to several classes, thus ensuring a higher rate of completion than other means, such as random distribution to campus mailboxes.  213 surveys were returned, thus ensuring that normal population distribution and making our survey results closer to the (unknown) population mean.  Due to our random sample of students, several freshmen completed the survey; unfortunately, due to our need for each participant’s GPA, surveys completed by freshmen were expunged from the results (45 surveys).  Two additional surveys were also removed due to a missing response in the “Hours of TV” and “Hours of Study” blank.  However, the summary statistics listed below include all 213 responses due to the value of the other data provided; 168 results were used in the simple regression of GPA and gender as the two additional expunged data had no bearing on this result. 

 

III.  Summary Statistics

 

Below I have reproduced the Excel Summary Statistics, which speak for themselves.  Of note is the mean GPA of 2.9698, the mean gender of .54, indicating a slight preponderance of females in our sample, the mean job of .5498, indicating that slightly more students are employed than not, mean varsity sport and dating of less than .50, indicating that the majority of our sample does not participate in a varsity sport and is not currently involved in a relationship, mean weekly hrs. of television of 9.1986, mean weekly hrs. of study of 16.5769, and mean hrs. of weekly Internet use of 2.7972

 

 

 

G.P.A

 

 

 

Mean

2.969764706

Standard Error

0.034093701

Median

2.975

Mode

2.7

Standard Deviation

0.444527477

Sample Variance

0.197604678

Range

2.33

Minimum

1.58

Maximum

3.91

Count

170

 

 

 

 

Gender

 

Job

 

 

 

 

 

Mean

0.5471698

Mean

0.549763

Standard Error

0.0342679

Standard Error

0.034332

Median

1

Median

1

Mode

1

Mode

1

Standard Deviation

0.4989481

Standard Deviation

0.498700

Sample Variance

0.2489493

Sample Variance

0.2487023

Range

1

Range

1

Minimum

0

Minimum

0

Maximum

1

Maximum

1

Count

212

Count

211

 

 

 

 

 

Sport

 

Dating

 

 

 

 

 

Mean

0.35714285

Mean

0.4952830

Standard Error

0.03314401

Standard Error

0.0344198

Median

0

Median

0

Mode

0

Mode

0

Standard Deviation

0.48030236

Standard Deviation

0.5011611

Sample Variance

0.23069036

Sample Variance

0.2511624

Range

1

Range

1

Minimum

0

Minimum

0

Maximum

1

Maximum

1

Count

210

Count

212

 

 

 

 

 

Hrs. of TV

 

Hrs. of study

 

Hrs. Internet

 

 

 

 

 

 

Mean

9.1985645

Mean

16.576886

Mean

2.7971698

Standard Error

0.6193108

Standard Error

0.5595803

Standard Error

0.2825435

Median

6

Median

15

Median

1.5

Mode

10

Mode

20

Mode

1

Standard Deviation

8.9532733

Standard Deviation

8.14761231

Standard Deviation

4.113895

 

Sample Variance

80.161103

Sample Variance

66.383586

Sample Variance

16.924138

Range

70

Range

44.7

Range

30

Minimum

0

Minimum

0.3

Minimum

0

Maximum

70

Maximum

45

Maximum

30

Count

209

Count

212

Count

212

 

 

 

 

 

 

 

IV.  Gender’s Influence on Grades

 

            After reviewing the summary statistics, we are able to finally proceed to our first point of interest, namely the affect of gender on GPA.  Our survey asked respondents to indicate their gender with (0) if male and (1) if female.  Before proceeding, I formulate the hypothesis that being female has a positive correlation to one's GPA.  In order to gain access to the data revealing this correlation, I first had to create a simple linear regression model.  The first step in creating this model is the assigning of one independent variable (x) and one dependent variable (y).  Gender is the obvious independent variable, as one’s sex is determined prior to birth; thus GPA is thought to be dependent on gender.  We shall now create an equation to represent our model:

                                               

                                                y(GPA)=b0 + b1(Gender)

 

However, since we do not have the gender and corresponding GPA data for the entire population of college students, we are forced to estimate bO and b1 using a sample, which is what our survey provides.  Thus the equation:

                                                y-hat1=bo + b1xi

Using the least squares method via Excel, our resulting equation is:

                                                y-hat=2.895 + (.1465)xi

2.895 is the point at which the line (were these results to be graphed) intersects the y access and .1465 represents the change in y when x is changed by one unit (slope).  In more meaningful terminology, were the survey respondent to be a male (b1=0), mean GPA would be 2.895.  Were the respondent to be a female, mean GPA would be 3.0415 (2.895 + (.1465)1).  Therefore, my earlier hypothesis that being a female has a beneficial influence on one’s GPA seems to be correct.  However, 3.0415 cannot be hailed as the definite female mean GPA, as bo tends to absorb many of the effects on the dependent variable. 

            Excel also furnishes with more data, which helps one better understand the value of the model created.  Labeled “R Square” (the coefficient of determination) under the “Regression Statistics” shown below, the corresponding number shows the percentage influence of gender on the total variation of GPA.  Thus our model has a coefficient of determination of .0270, meaning that 2.70% of the total variation in GPA can be explained by using our regression equation between gender and GPA.  While this percentage is quite small, it is nonetheless interesting that a factor completely out of one’s control, gender, seems to influence 2.70% of one’s total GPA variation.

 

SUMMARY OUTPUT

 

 

 

 

 

 

 

 

 

 

Regression Statistics

 

 

 

 

Multiple R

0.164436

 

 

 

 

R Square

0.027039

 

 

 

 

Adjusted R Square

0.021178

 

 

 

 

Standard Error

0.442258

 

 

 

 

Observations

168

 

 

 

 

 

 

 

 

 

 

ANOVA

 

 

 

 

 

 

df

SS

MS

F

Significance F

Regression

1

0.902320

0.9023203

4.6132735

0.03317587

Residual

166

32.46830

0.1955922

 

 

Total

167

33.37062

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

 

Intercept

2.895180

0.048544

59.640179

4.8567E-114

 

Gender

0.146584

0.068246

2.1478532

0.0331758

 

 

            In order to determine the overall significance of the model, we must perform a 2-tailed hypothesis test, the null being that b1=0, meaning that there exists no statistical relationship; the alternative hypothesis is that b1¹0, meaning that there indeed exists a statistical relationship between gender and GPA.  We shall find a critical value tD/2 and compare it with the test statistic furnished by Excel; if the absolute value of the test statistic is greater than the critical value, the null is reject and it may be said that a statistical relationship exists between gender and GPA.  Thus, our test statistic of 2.1479 is greater than critical value at the 95% degree of confidence and we may reject the null—we may then say that gender has a statistically significant affect on one’s GPA. 

 

V.  Effect of External Activities

            We may now turn our attention to factors that are controllable by the student which may or may not have an affect on GPA.  The following questions were asked by our survey: do you have an on/off campus job?; do you participate in varsity athletes?; are you currently involved in a relationship (dating) with someone?; how many hours of television do you watch per week?; how many hours per week do you spend studying?; and finally how many hours per week are spent using the internet?  Prior to using Excel to analyze the survey results, I hypothesized that watching television, using the internet, dating, and having a campus job would all negatively impact GPA, while studying and playing a sport (due to the necessitated “budgeting” of time) will increase GPA.  Once again I decided to use GPA as the dependent (y) variable, believing it to be dependent upon the several independent variables mentioned above.  Once again, since the population data is unknown, we are forced to use sample data, which our survey provides.  Since more than one independent variable is being studied in relation to the dependent variable, a multiple regression must be used.  Our equation is thus:

 

y-hat=bo + b1(job)+b2(varsity sport) + b3(dating) + b4(hrs. tv) + b5(hrs. study) + b6(hrs. internet)

 

Excel reports the following result:

                y=2.9466 +.06556x1+.01566x2+-.02641x3+-.00820x4+.00554x5+-.00494x6

 

Interpretation of the data reveals that the slope is 2.9466, meaning that were all the variables zero, this would be the GPA of the student.  Yet again, we cannot place too much credence in this number, as it absorbs many of the effects on y.  Each x represents the change in y when xi is changed by one unit holding the other variables constant.  Employment has a positive correlation with GPA, our results stating that a job increases GPA by .06556.  Participation in varsity athletes also has a positive correlation with GPA, our survey results showing that GPA increases .01566 for a varsity athlete.  Dating has a negative correlation with GPA, having a boy/girl friend decreasing GPA by -.02641.  Watching television also appears to negatively influence GPA, each additional hour of television watched per week decreases GPA by -.00820. 

On the positive side, studying appears to have a positive correlation with GPA, each additional hour spent studying increases GPA by .0055.  Finally, use of the internet appears to negatively impact GPA, each additional hour spent on the internet decreases GPA by -.0049.   

 

 

 

SUMMARY OUTPUT

 

 

 

 

 

 

 

 

 

 

Regression Statistics

 

 

 

 

Multiple R

0.2417079

 

 

 

 

R Square

0.0584227

 

 

 

 

Adjusted R Square

0.0228914

 

 

 

 

Standard Error

0.4413899

 

 

 

 

Observations

166

 

 

 

 

 

 

 

 

 

 

ANOVA

 

 

 

 

 

 

df

SS

MS

F

Significance F

Regression

6

1.9220632

0.32034

1.64426

0.1382864

Residual

159

30.977184

0.19482

 

 

Total

165

32.899248

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

 

Intercept

2.9465569

0.1119822

26.3127

8.17E-60

 

Job

0.0655604

0.0706795

0.92757

0.35503

 

Sport

0.0156648

0.0751454

0.20846

0.83513

 

Dating

-.0264096

0.0699791

-0.37739

0.70638

 

Hrs. TV

-.0082007

0.0036951

-2.21935

0.02787

 

Hrs. Study

0.0055390

0.0043745

1.26620

0.20729

 

Hrs. Net

-.0049478

0.0077203

-0.64087

0.52252

 

 

            Analysis of the significance of our model and its variables requires a 2-tailed hypothesis test, the null hypothesis being that bp =0, meaning that our overall model has no significance in the explanation of the variation of GPA, and the alternative hypothesis bp  ¹0, meaning that our overall model is significant.  Excel reveals that the model’s test F is 1.644, and when this is compared to the critical F values at 90%, 95%, and 99% degrees of significance, it is apparent that the test F is greater than none of these critical Fs, thus I cannot reject the null and my overall model is not significant.  This may explain why my adjusted R2 , which is used in multiple regression instead of R2  in order to compensate for the number of independent variables in the model, reveals that my model accounts for a mere 2.28% of the variation in GPA. 

Despite the finding that the overall model is insignificant, we can still test for the significance of the individual coefficients of the independent variables.  For each of the six variables, a hypothesis test is completed, the null hypothesis equating each variable with zero, and the alternative hypothesis stating that each variable is not equal to zero.  A rejection of the null indicates that the coefficient is significant.  Each variables t Stat will be compared with t(/2  if the t Stat is greater than t(/2  the null may be rejected and the variable is significant.  At neither the 90%, 95%, or 99% degree of confidence are the t Stats for employment, varsity sport, dating, hours studying, or hours on the internet, greater than t(/2  at these levels, thus these variables are not significant.  However, the t Stat for hours watching television is greater than t(/2  at the 90% and 95% degree of confidence and thus this variable is significant.

 

 VI.  Deductive Inferences

 

            The results of my analysis of the surveys produced somewhat different results than expected.  Some initial hypotheses, such as the advantage of females over males in terms of GPA due exclusively to gender and the fact that television viewing harms GPA, were confirmed by the survey.  Of interest is the fact that it appears as though 2.11% of the variance in GPA is influenced by gender, which is a non-controllable factor for the student.  Yet, this also leads one to conclude that 97% of the GPA is dependent upon choices made by the student.  Unfortunately, my attempt to ascertain what some of these variables include failed, due to the fact that my model is not significant.  According to my model, the only variable that appears to influence (in a negative fashion) GPA is the number of hours of television watched per week.  Thus, my only advice to students, teachers, and administrators, is to watch fewer hours of television if one wishes to increase GPA.  Factors not covered by the survey, such as intelligence quotient, memory, and classroom attendance may be greater factors in GPA than the variables used in my model; thus, more surveying and analysis must be completed in order to attempt to gain a better notion of the influences on the variance of GPA.