Two-stage least squares regression

The two-stage least squares method is used to handle model with endogenous explanatory variables in a linear regression framework.

Principle of the two-stage least squares

The two-stage least squares method is used to handle model with endogenous explanatory variables in a linear regression framework. An endogenous variable is a variable which is correlated with the error term in the regression model. Using endogenous variable is in contradiction with the linear regression assumptions. This kind of variable can be encountered when variable are measured with error.

The general principle of the two-stage least squares approach is to use instrumental variables uncorrelated with the error term to estimate the model parameters. These instrumental variables are correlated to the endogenous variables but not with the error term of the model.

Results of the two-stage least squares inXLSTAT

Goodness of fit statistics: The statistics relating to the fitting of the regression model are shown in this table:

Observations: The number of observations used in the calculations. In the formulas shown below, n is the number of observations.
Sum of weights: The sum of the weights of the observations used in the calculations. In the formulas shown below, W is the sum of the weights.
DF: The number of degrees of freedom for the chosen model (corresponding to the error part).
R²: The determination coefficient for the model. This coefficient, whose value is between 0 and 1, is only displayed if the constant of the model has not been fixed by the user. The R² is interpreted as the proportion of the variability of the dependent variable explained by the model. The nearer R² is to 1, the better is the model. The problem with the R² is that it does not take into account the number of variables used to fit the model.
Adjusted R²: The adjusted determination coefficient for the model. The adjusted R² can be negative if the R² is near to zero. This coefficient is only calculated if the constant of the model has not been fixed by the user. The adjusted R² is a correction to the R² which takes into account the number of variables used in the model.
MSE: The mean of the squares of the errors (MSE).
RMSE: The root mean square of the errors (RMSE) is the square root of the MSE.
MAPE: The Mean Absolute Percentage Error.
DW: The Durbin-Watson statistic. This coefficient is the order 1 autocorrelation coefficient and is used to check that the residuals of the model are not autocorrelated, given that the independence of the residuals is one of the basic hypotheses of linear regression. The user can refer to a table of Durbin-Watson statistics to check if the independence hypothesis for the residuals is acceptable.
Cp: Mallows Cp coefficient. The nearer the Cp coefficient is to p*, the less the model is biased.
AIC: Akaike’s Information Criterion. This criterion, proposed by Akaike (1973) is derived from the information theory and uses Kullback and Leibler's measurement (1951). It is a model selection criterion which penalizes models for which adding new explanatory variables does not supply sufficient information to the model, the information being measured through the MSE. The aim is to minimize the AIC criterion.
SBC: Schwarz’s Bayesian Criterion. This criterion, proposed by Schwarz (1978) is similar to the AIC, and the aim is to minimize it.
PC: Amemiya’s Prediction Criterion. This criterion, proposed by Amemiya (1980) is used, like the adjusted R² to take account of the parsimony of the model.

Analysis of variance table: It is used to evaluate the explanatory power of the explanatory variables. Where the constant of the model is not set to a given value, the explanatory power is evaluated by comparing the fit (as regards least squares) of the final model with the fit of the rudimentary model including only a constant equal to the mean of the dependent variable. Where the constant of the model is set, the comparison is made with respect to the model for which the dependent variable is equal to the constant which has been set.

The parameters of the model table: It displays the estimate of the parameters, the corresponding standard error, the Student’s t, the corresponding probability, as well as the confidence interval

Model equation: The equation of the model is then displayed to make it easier to read or re-use the model.

Standardized coefficients table: The table of standardized coefficients is used to compare the relative weights of the variables. The higher the absolute value of a coefficient, the more important the weight of the corresponding variable. When the confidence interval around standardized coefficients has value 0 (this can be easily seen on the chart of normalized coefficients), the weight of a variable in the model is not significant.

Predictions and residuals table: The predictions and residuals table shows, for each observation, its weight, the value of the qualitative explanatory variable, if there is only one, the observed value of the dependent variable, the model's prediction, the residuals and the confidence intervals together with the fitted prediction. Two types of confidence interval are displayed: a confidence interval around the mean (corresponding to the case where the prediction would be made for an infinite number of observations with a set of given values for the explanatory variables) and an interval around the isolated prediction (corresponding to the case of an isolated prediction for the values given for the explanatory variables). The second interval is always greater than the first, the random values being larger. If the validation data have been selected, they are displayed at the end of the table.

Graphical results of the two-stage least squareinXLSTAT:

The charts which follow show the results mentioned above.

If there is only one explanatory variable in the model, the first chart displayed shows the observed values, the regression line and both types of confidence interval around the predictions.

The second chart shows the normalized residuals as a function of the explanatory variable. In principle, the residuals should be distributed randomly around the X-axis. If there is a trend or a shape, this shows a problem with the model.

The three charts displayed next show respectively the evolution of the standardized residuals as a function of the dependent variable, the distance between the predictions and the observations (for an ideal model, the points would all be on the bisector), and the standardized residuals on a bar chart. The last chart quickly shows if an abnormal number of values are outside the interval ]-2, 2[ given that the latter, assuming that the sample is normally distributed, should contain about 95% of the data.

View all tutorials