Omitted Variable Bias
Omitted variable bias is present when two conditions are met:
- (1) the omitted variable is correlated with the movement of the independent variable in the model, and
- (2) the omitted variable is a determinant of the dependent variable.
The General Multiple Linear Regression Model
The multiple regression equation specifies a dependent variable as a linear function of two or more independent variables:
Yi = B0 + B1X1i + B2X2i + … + BkXki + εi
The intercept term is the value of the dependent variable when the independent variables are equal to zero. Each slope coefficient is the estimated change in the dependent variable for a one-unit change in that independent variable, holding the other independent variables constant.
The Slope Coefficient in Multiple regression
In a multivariate regression, each slope coefficient is interpreted as a partial slope coefficient in that it measures the effect on the dependent variable from a change in the associated independent variable holding other things constant.
Homoscedasticity and Heteroscedasticity in Multiple regression
In multiple regression, homoscedasticity and heteroskedasticity are just extensions of their definitions discussed in the previous reading. Homoscedasticity refers to the condition that the variance of the error term is constant for all independent variables, X, from i = 1 to n: Var(εi | Xi) = σ2. Heteroskedasticity means that the dispersion of the error terms varies over the sample. It may take the form of conditional heteroskedasticity, which says that the variance is a function of the independent variables.
Homoscedasticity means that the variance of error terms is constant for all independent variables, while heteroskedasticity means that the variance of error terms varies over the sample. Heteroskedasticity may take the form of conditional heteroskedasticity, which says that the variance is a function of the independent variables.
Measures of Fit in Multiple Regression
Multiple regression estimates the intercept and slope coefficients such that the sum of the squared error terms is minimized. The estimators of these coefficients are known as ordinary least squares (OLS) estimators. The OLS estimators are typically found with statistical software.
The standard error of the regression (SER) is the standard deviation of the predicted values for the dependent variable about the regression line:
The coefficient of determination, R2, is the percentage of the variation in Y that is explained by the set of independent variables.
- R2 increases as the number of independent variables increases—this can be a problem.
- The adjusted R2 adjusts the R2 for the number of independent variables.
Assumptions of the multiple linear regression model
Assumptions of multiple regression mostly pertain to the error term, εi.
- A linear relationship exists between the dependent and independent variables.
- The independent variables are not random, and there is no exact linear relation between any two or more independent variables.
- The expected value of the error term is zero.
- The variance of the error terms is constant.
- The error for one observation is not correlated with that of another observation.
- The error term is normally distributed.
Multicollinearity
Perfect multicollinearity exists when one of the independent variables is a perfect linear combination of the other independent variable. Imperfect multicollinearity arises when two or more independent variables are highly correlated, but less than perfectly correlated.