Problem set 3:   OLS assumptions and omitted variable bias

Part A:

1.  OLS assumption:  Error term and the independent variables are not correlated.

If this assumption is violated, OLS generates biased estimates (expected Beta-hat is not equal to B).   Biased estimates mean we have incorrect estimates.
When we use OLS to estimate Beta-hats it will ALWAYS force the correlation between the independent variables and the residuals to be zero.  Therefore, if the error (from the true model) is correlated with the independent variable(s) OLS generates biased Beta-hats.

This assumption is likely to be violated when estimating Demand and/or Supply equations (but there is a way to get unbiased estimates, referred to as two stage least squares).  This assumption can also be violated when relevant independent variables are not included in the regression but IF and ONLY IF the omitted variables are correlated with the independent variables already in the regression model (often referred to as "omitted variable bias").   It is quite likely you may have this problem in your project.  The example below shows this bias when we omit variables.

Suppose the true model is
         Y = B0 + B1*X1 + B2*X2 + e    (equation 1)
and  X1 = B5*X2 + ex1                       (equation 2)
 where Bs are the true coefficients and e and ex1 are error terms. 
The true model indicates that X1 and X2 are correlated (through B3 - that's why we need equation 2).

a) If we estimate the following regression (X1 not included in the regression) :  Y=b0 + b2*X2 + u1    
(lower case b is used for B-hat),  is expected b2 equal to B2? 
Show algebraically what expected b2 is.  Show every step of your algebraic work.

NOTE: If we write equation (1) as Y=B0 + B2*X2 + e2, where e2=(B1X1+e), it is easy to see why e2 and X2 are correlated since whenever X2 changes, X1 changes AND e2! Therefore, X2 and e2 are correlated which is a violation of one of the OLS assumptions.
b) If we estimated the following regression instead:  Y=b0 + b1*X1+ b2*X2 + u
are the expected b1 and b2 equal to B1 and B2, respectively?  Why or why not (hint: are we omitting any relevant variables)?

2.  Estimations

Use data from this Excel file for the following exercises (copy the excel file data into Eviews or copy the data into a new excel file to read into Eviews.  Use Eviews for all estimations below):
The data in the Excel file is from a simulation of the following model (this is the 'true' model):
                 Y = 1.0 + 2*X1 + 2*X2 + e1        where X2, e1 and e2 are all random variables.  e1 and e2 are error terms.
        and  X1 = (-3)*X2 + e2

Generate a correlation matrix (or compute the pair wise correlations one by one) for the dependent and independent variables:  Y, X1, X2.  Is Y correlated with X1 and X2? Are X1 and X2 correlated?

Estimate the following regressions: Y = b0 + b1*X1 + b2*X2 + u1       where u1 are the OLS residuals

Are the estimated slope coefficients (b1 and b2) close to the true coefficients (a.k.a. population coefficients)?

c)  Estimate the following regression:   Y = b0 + b2*X2 + u2             where u2 are the OLS residuals
Is the estimated b2 close to the true coefficient?

d)  Compute the correlation between X2 and u2 (the residuals from the regression in c).  Is the correlation zero (in a statistical sense)? Does that mean the OLS assumption is not violated?  Why?

Part B:   t-tests on the estimated coefficients
a)  Use the estimates from 2b above and test the null hypothesis: B1=2.  Also test if B2=2 (we are basically testing if the estimated coefficients are equal, in a statistical sense, to the true coefficients). 
Are the estimated coefficients significantly different from the true coefficients (which we know)?  Specify the null and the alternative hypothesis and test the null using a two-sided test at the 10% level of significance.  
Clearly state the null and the alternative hypotheses, show how you compute the t-statistics, state the critical t-value, the degrees of freedom, your criteria for rejection and your conclusion.

b) Use the estimates from 2c above and test the null hypothesis: B2=2.
Can you reject the null?  Specify the null and the alternative hypothesis and test the null using a two-sided test at the 10% level.  
Clearly state the null and the alternative hypotheses, show how you compute the t-statistics, state the critical t-value, the degrees of freedom, your criteria for rejection and your conclusion.