Ordinary least squares (OLS) or linear least squares is a method for estimating the unknown parameters in a linear regression model, with the goal of minimizing the differences between the observed responses in some arbitrary dataset and the responses predicted by the linear approximation of the data (visually this is seen as the sum of the vertical distances between each data point in the set and the corresponding point on the regression line – the smaller the differences, the better the model fits the data).
Ordinary Least Squares (OLS) is a statistical technique which attempts to find the function which most closely approximates the data (a “best fit”). Thus, in general terms, it is an approach to fitting a model to the observed data. This model is specified by an equation with “free” parameters. In technical terms, the Least Squares method is used to fit a straight line through a set of data-points, so that the sum of the squared vertical distances (called residuals) from the actual data-points is minimized.
Some Assumptions in Linear Regression
Regression assumes that variables have normal distributions. Non-normally distributed variables (highly skewed or kurtotic variables, or variables with substantial outliers) can distort relationships and significance tests.
Standard multiple regression can only accurately estimate the relationship between dependent and independent variables if the relationships are linear in nature. As there are many instances in the social sciences where non-linear relationships occur (e.g., anxiety), it is essential to examine analyses for non-linearity. If the relationship between independent variables (IV) and the dependent variable (DV) is not linear, the results of the regression analysis will under-estimate the true relationship.
Homoscedasticity means that the variance of errors is the same across all levels of the independant variables. When the variance of errors differs at different values of the independant variables, heteroscedasticity is indicated.
Reliability of values
The nature of our educational and social science research means that many variables we are interested in are also difficult to measure, making measurement error a particular concern. In simple correlation and regression, unreliable measurement causes relationships to be under-estimated increasing the risk of Type II errors. In the case of multiple regression or partial correlation, effect sizes of other variables can be over-estimated if the covariate is not reliably measured, as the full effect of the covariate(s) would not be removed.
Interpreting the Regression Output from Excel
To use the regression facility in Excel, first be sure you have your data entered in columns. Click on Tools, then on Data Analysis. From the list of options, choose Regression. A dialog box will open which allows you to mark or otherwise designate the range for the x variable and the range for the y variable. You should also mark the data label at the top of each column and place a check mark in the “Labels” box in the dialog box. If you do not designate an area on the open worksheet to place the output, Excel will by default create a new worksheet to receive the output. Generally speaking, it’s best to click on the radio button for output range and choose a space next to your data to receive the output. Your output will look like this: