Statistics
Textbooks
Boundless Statistics
Correlation and Regression
Multiple Regression
Statistics Textbooks Boundless Statistics Correlation and Regression Multiple Regression
Statistics Textbooks Boundless Statistics Correlation and Regression
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 7
Created by Boundless

Evaluating Model Utility

The results of multiple regression should be viewed with caution.

Learning Objective

  • Evaluate the potential drawbacks of multiple regression.


Key Points

    • You should examine the linear regression of the dependent variable on each independent variable, one at a time, examine the linear regressions between each pair of independent variables, and consider what you know about the subject matter.
    • You should probably treat multiple regression as a way of suggesting patterns in your data, rather than rigorous hypothesis testing.
    • If independent variables $A$ and $B$ are both correlated with $Y$, and $A$ and $B$ are highly correlated with each other, only one may contribute significantly to the model, but it would be incorrect to blindly conclude that the variable that was dropped from the model has no significance.

Terms

  • multiple regression

    regression model used to find an equation that best predicts the $Y$ variable as a linear function of multiple $X$ variables

  • dependent variable

    in an equation, the variable whose value depends on one or more variables in the equation

  • independent variable

    in an equation, any variable whose value is not dependent on any other in the equation


Full Text

Multiple regression is beneficial in some respects, since it can show the relationships between more than just two variables; however, it should not always be taken at face value.

It is easy to throw a big data set at a multiple regression and get an impressive-looking output. But many people are skeptical of the usefulness of multiple regression, especially for variable selection, and you should view the results with caution. You should examine the linear regression of the dependent variable on each independent variable, one at a time, examine the linear regressions between each pair of independent variables, and consider what you know about the subject matter. You should probably treat multiple regression as a way of suggesting patterns in your data, rather than rigorous hypothesis testing.

If independent variables $A$ and $B$ are both correlated with $Y$, and $A$ and $B$ are highly correlated with each other, only one may contribute significantly to the model, but it would be incorrect to blindly conclude that the variable that was dropped from the model has no biological importance. For example, let's say you did a multiple regression on vertical leap in children five to twelve years old, with height, weight, age, and score on a reading test as independent variables. All four independent variables are highly correlated in children, since older children are taller, heavier, and more literate, so it's possible that once you've added weight and age to the model, there is so little variation left that the effect of height is not significant. It would be biologically silly to conclude that height had no influence on vertical leap. Because reading ability is correlated with age, it's possible that it would contribute significantly to the model; this might suggest some interesting followup experiments on children all of the same age, but it would be unwise to conclude that there was a real effect of reading ability and vertical leap based solely on the multiple regression.

Linear Regression

Random data points and their linear regression.

[ edit ]
Edit this content
Prev Concept
Estimating and Making Inferences About the Slope
Using the Model for Estimation and Prediction
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.