This checklist was last updated on Wednesday April 16, 2003.


 

MULTIPLE REGRESSION MODEL CHECKLIST

 

I.         Specify the functional relationship suggested by economic theory using the general form of:

 

Dependent variable (Y) = f(All theoretically-relevant Independent Variables: X1, X2, …, Xk)

 

II.       Write the model into the following generalized form using the actual variables for the particular model:

 

estimated Y = b0 + b1*X1 + b2*X2 + ... + bk*Xk

 

III.      Specify the hypothesis tests that you will conduct after obtaining your regression results.  These should include the following:

 

A.      Hypothesis tests on each OLS-estimated coefficient (b0, b1, b2... bk).

 

1.      Ha should reflect the relationship suggested by economic theory.

 

2.      Use the appropriate t-test to evaluate the hypotheses.

 

3.      Typically, the underlying economic theory will suggest a one-tail t-test for each coefficient.  But if the economic relationship between certain dependent and independent variables is indeterminate, use a two-tail test.

 

B.     Hypothesis tests on the overall goodness of fit for the model, evaluating the following hypotheses:

 

                                                           H0: b1 = b2 = ... = bk = 0

                                         Ha: at least one coefficient is not equal to zero.

 

1.      Use an F-test to evaluate the joint significance of all independent variables. 

 

2.      Rejection of H0 implies that at least one of the independent variables adds explanatory power to the model.  Further testing is needed to determine which one(s).

 

C.     Hypothesis tests on the correlation between the dependent variable and any one of the independent variables.  Perform these tests after the “overall goodness of fit” F test.

 

1.      Remember that each one of these tests is evaluating if the independent variable in question adds significant explanatory power to the model.

 

2.      The appropriate test statistic to use in this case is a two-tail t-test associated with each coefficient.

 

NOTE: Given the similarity in the tests run for parts A and C, you may want to run both the one-tail test to evaluate the theoretical relationship and the two-tail test to evaluate the explanatory power of each independent variable, simultaneously.

 

 

IV.    Collect the necessary data.  If you find there is not an adequate data series for some of the independent variables, you may now need to be respecify the model to include only the estimable variables.  Adjust the hypotheses tests in part III as necessary.

 

V.      Perform a regression analysis of the above-specified model.

 

VI.    Interpret the results.

 

A.      Determine what each regression coefficient says about the relationship between the dependent variable and the corresponding independent variable.

 

B.     Remember how to interpret the regression coefficients.  For a linear model, a one-unit change in the Xi variable corresponds to a bi unit effect on y, holding the effects of all of the other independent variables constant.

 

C.     Interpret the adjusted-R2 regarding the degree to which the model explains the variation in the dependent variable.

 

VII.   Evaluate the statistical significance of the results.

 

A.      For the individual regression coefficients:

 

1.      Determine if the hypothesized sign was obtained.

 

2.      Conduct the hypothesis tests and determine, at the appropriate level of significance, whether to reject or not reject the null hypothesis.

 

3.      Evaluate the p-value obtained.

 

B.     Evaluate the F-stat and determine the overall significance of the model.

 

C.     After using the F-test to reject H0, assess which of the independent variables contribute significantly to the model’s ability to explain the variation in the dependent variable.

 

Since it is likely that you will not obtain statistically “perfect” results after the first run, you may want to eliminate variables that did not produce the predicted results.  Before eliminating all such variables, however, you will want to conduct an additional test on the initial model.

 

D.     Evaluate your model results for multicollinearity.

 

1.      A rough test of the presence of multicollinearity is a “high” R2 and “small” t-stats and “incorrect” signs for the b’s.

 

2.      If those results occur, the next step is to perform a formal test using the correlation matrix of the dependent and independent variables.

 

3.      If the correlation between any two independent variables is greater than the correlation between either one of the independent variables and the dependent variable, then multicollinearity is serious and will be a problem.

 

E.   Remedies for multicollinearity include

 

(1)   Do nothing if the multicollinearity does not reduce the t-stats to make them insignificant or change the coefficients enough to make them different from what you would expect, especially with regard to their signs;

 

(2)   Drop one or more of the multicollinear independent variables.  However, you must weigh this solution against omitting relevant independent variables from the model; or

 

(3)   Increase the sample size, which is the best solution since multicollinearity is a sample problem.  However, since you should include all available data from the beginning, this may not be a feasible option.

 

 

Please note that there are many additional statistical problems that can arise, but they are beyond the scope of this course.

 


 

The OLS technique used thus far has been applied to a linear model.  However, many economic relationships are better described by a non-linear form, which requires a differently-specified form.  Perhaps the most frequently-used transformation is the double-log transformation.

 

Here the natural log of both the dependent and independent variables is taken so that the model looks like the following:

 

lnY =  b0 + b1*lnX1 + b2*lnX2 + ... + bk*lnXk

 

This transformation is appropriate when the underlying relationship between Y and the X’s is exponential, as in the cases of demand and supply equations and the Cobb-Douglas production function.

 

In the case of demand and supply models, economic theory often assumes constant price, cross, and income elasticities while assuming that non-constant slopes of the curves.  In other words, non-linear models are assumed to better capture the underlying theory, leading to the following equation for the demand of a product:

 

Yd = (eB0IB2PsB3)PB1

 

In its present form, this equation cannot be estimated using ordinary least squares.  However, by taking the natural log of both sides, the model now becomes

 

ln(Yd) = B0 +B1* ln(P) +  B2*ln(I) + B3* ln(Ps)

 

which can now be estimated using OLS.  With this model, the B’s now reflect the corresponding elasticities, so that the interpretation of, for example B1, is that a one percentage point change in price will correspond to a B1 change in the quantity demanded.