Statistics
Textbooks
Boundless Statistics
Correlation and Regression
Multiple Regression
Statistics Textbooks Boundless Statistics Correlation and Regression Multiple Regression
Statistics Textbooks Boundless Statistics Correlation and Regression
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 6
Created by Boundless

Multiple Regression Models

Multiple regression is used to find an equation that best predicts the $Y$ variable as a linear function of the multiple $X$ variables.

Learning Objective

  • Describe how multiple regression can be used to predict an unknown $Y$ value based on a corresponding set of $X$ values or understand functional relationships between the dependent and independent variables.


Key Points

    • One use of multiple regression is prediction or estimation of an unknown $Y$ value corresponding to a set of $X$ values.
    • A second use of multiple regression is to try to understand the functional relationships between the dependent and independent variables, to try to see what might be causing the variation in the dependent variable.
    • The main null hypothesis of a multiple regression is that there is no relationship between the $X$ variables and the $Y$ variables--i.e. that the fit of the observed $Y$ values to those predicted by the multiple regression equation is no better than what you would expect by chance.

Terms

  • multiple regression

    regression model used to find an equation that best predicts the $Y$ variable as a linear function of multiple $X$ variables

  • null hypothesis

    A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.


Full Text

When To Use Multiple Regression

You use multiple regression when you have three or more measurement variables. One of the measurement variables is the dependent ($Y$) variable. The rest of the variables are the independent ($X$) variables. The purpose of a multiple regression is to find an equation that best predicts the $Y$ variable as a linear function of the $X$ variables.

Multiple Regression For Prediction

One use of multiple regression is prediction or estimation of an unknown $Y$ value corresponding to a set of $X$ values. For example, let's say you're interested in finding a suitable habitat to reintroduce the rare beach tiger beetle, Cicindela dorsalis dorsalis, which lives on sandy beaches on the Atlantic coast of North America. You've gone to a number of beaches that already have the beetles and measured the density of tiger beetles (the dependent variable) and several biotic and abiotic factors, such as wave exposure, sand particle size, beach steepness, density of amphipods and other prey organisms, etc. Multiple regression would give you an equation that would relate the tiger beetle density to a function of all the other variables. Then, if you went to a beach that didn't have tiger beetles and measured all the independent variables (wave exposure, sand particle size, etc.), you could use the multiple regression equation to predict the density of tiger beetles that could live there if you introduced them.

Atlantic Beach Tiger Beetle

This is the Atlantic beach tiger beetle (Cicindela dorsalis dorsalis), which is the subject of the multiple regression study in this atom.

Multiple Regression For Understanding Causes

A second use of multiple regression is to try to understand the functional relationships between the dependent and independent variables, to try to see what might be causing the variation in the dependent variable. For example, if you did a regression of tiger beetle density on sand particle size by itself, you would probably see a significant relationship. If you did a regression of tiger beetle density on wave exposure by itself, you would probably see a significant relationship. However, sand particle size and wave exposure are correlated; beaches with bigger waves tend to have bigger sand particles. Maybe sand particle size is really important, and the correlation between it and wave exposure is the only reason for a significant regression between wave exposure and beetle density. Multiple regression is a statistical way to try to control for this; it can answer questions like, "If sand particle size (and every other measured variable) were the same, would the regression of beetle density on wave exposure be significant? "

Null Hypothesis

The main null hypothesis of a multiple regression is that there is no relationship between the $X$ variables and the $Y$ variables-- in other words, that the fit of the observed $Y$ values to those predicted by the multiple regression equation is no better than what you would expect by chance. As you are doing a multiple regression, there is also a null hypothesis for each $X$ variable, meaning that adding that $X$ variable to the multiple regression does not improve the fit of the multiple regression equation any more than expected by chance.

[ edit ]
Edit this content
Prev Concept
Homogeneity and Heterogeneity
Estimating and Making Inferences About the Slope
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.