Statistics
Textbooks
Boundless Statistics
Correlation and Regression
The Regression Line
Statistics Textbooks Boundless Statistics Correlation and Regression The Regression Line
Statistics Textbooks Boundless Statistics Correlation and Regression
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 7
Created by Boundless

Least-Squares Regression

The criteria for determining the least squares regression line is that the sum of the squared errors is made as small as possible.

Learning Objective

  • Describe how OLS are implemented in linear regression


Key Points

    • Linear regression dictates that if there is a linear relationship between two variables, you can then use one variable to predict values on the other variable.
    • The least squares regression method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear approximation.
    • Least squares regression provides minimum-variance, mean-unbiased estimation when the errors have finite variances.

Terms

  • least squares regression

    a statistical technique, based on fitting a straight line to the observed data. It is used for estimating changes in a dependent variable which is in a linear relationship with one or more independent variables

  • sum of squared errors

    a mathematical approach to determining the dispersion of data points; found by squaring the distance between each data point and the line of best fit and then summing all of the squares

  • homoscedastic

    if all random variables in a sequence or vector have the same finite variance


Full Text

Least Squares Regression

The process of fitting the best- fit line is called linear regression. Finding the best fit line is based on the assumption that the data are scattered about a straight line. The criteria for the best fit line is that the sum of squared errors (SSE) is made as small as possible. Any other potential line would have a higher SSE than the best fit line. Therefore, this best fit line is called the least squares regression line.

Here is a scatter plot that shows a correlation between ordinary test scores and final exam test scores for a statistics class:

Test Score Scatter Plot

This graph shows the various scattered data points of test scores.

The following figure shows how a best fit line can be drawn through the scatter plot graph: .

Best Fit Line

This shows how the scatter plots form a best fit line, implying there may be correlation.

Ordinary Least Squares Regression

Ordinary Least Squares (OLS) regression (or simply "regression") is a useful tool for examining the relationship between two or more interval/ratio variables assuming there is a linear relationship between said variables. If the relationship is not linear, OLS regression may not be the ideal tool for the analysis, or modifications to the variables/analysis may be required. If there is a linear relationship between two variables, you can use one variable to predict values of the other variable. For example, because there is a linear relationship between height and weight, if you know someone's height, you can better estimate their weight. Using a basic line formula, you can calculate predicted values of your dependent variable using your independent variable, allowing you to make better predictions.

This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear approximation. The resulting estimator can be expressed by a simple formula, especially in the case of a single regressor on the right-hand side. The OLS estimator is consistent when the regressors are exogenous and there is no perfect multicollinearity. It is considered optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance, mean-unbiased estimation when the errors have finite variances. Under the additional assumption that the errors are normally distributed, OLS is the maximum likelihood estimator. OLS is used in fields such as economics (econometrics), political science, and electrical engineering (control theory and signal processing), among others

[ edit ]
Edit this content
Prev Concept
Two Regression Lines
Model Assumptions
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.