Statistics
Textbooks
Boundless Statistics
Correlation and Regression
Multiple Regression
Statistics Textbooks Boundless Statistics Correlation and Regression Multiple Regression
Statistics Textbooks Boundless Statistics Correlation and Regression
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 8
Created by Boundless

Estimating and Making Inferences About the Slope

The purpose of a multiple regression is to find an equation that best predicts the $Y$ variable as a linear function of the $X$ variables.

Learning Objective

  • Discuss how partial regression coefficients (slopes) allow us to predict the value of $Y$ given measured $X$ values.


Key Points

    • Partial regression coefficients (the slopes) and the intercept are found when creating an equation of regression so that they minimize the squared deviations between the expected and observed values of $Y$.
    • If you had the partial regression coefficients and measured the $X$ variables, you could plug them into the equation and predict the corresponding value of $Y$.
    • The standard partial regression coefficient is the number of standard deviations that $Y$ would change for every one standard deviation change in $X_1$, if all the other $X$ variables could be kept constant.

Terms

  • p-value

    The probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.

  • partial regression coefficient

    a value indicating the effect of each independent variable on the dependent variable with the influence of all the remaining variables held constant. Each coefficient is the slope between the dependent variable and each of the independent variables

  • standard partial regression coefficient

    the number of standard deviations that $Y$ would change for every one standard deviation change in $X_1$, if all the other $X$ variables could be kept constant


Full Text

You use multiple regression when you have three or more measurement variables. One of the measurement variables is the dependent ($Y$) variable. The rest of the variables are the independent ($X$) variables. The purpose of a multiple regression is to find an equation that best predicts the $Y$ variable as a linear function of the $X$variables.

How It Works

The basic idea is that an equation is found like this: 

$Y_{\text{exp}} = a+ b_1X_1 + b_2X_2 + b_3X_3 + \cdots$

The $Y_{\text{exp}}$ is the expected value of $Y$ for a given set of $X$ values. $b_1$ is the estimated slope of a regression of $Y$ on $X_1$, if all of the other $X$ variables could be kept constant. This concept applies similarly for $b_2$, $b_3$, et cetera. $a$ is the intercept. Values of $b_1$, et cetera, (the "partial regression coefficients") and the intercept are found so that they minimize the squared deviations between the expected and observed values of $Y$.

How well the equation fits the data is expressed by $R^2$, the "coefficient of multiple determination. " This can range from 0 (for no relationship between the $X$ and $Y$ variables) to 1 (for a perfect fit, i.e. no difference between the observed and expected $Y$ values). The $p$-value is a function of the $R^2$, the number of observations, and the number of $X$ variables.

Importance of Slope (Partial Regression Coefficients)

When the purpose of multiple regression is prediction, the important result is an equation containing partial regression coefficients (slopes). If you had the partial regression coefficients and measured the $X$ variables, you could plug them into the equation and predict the corresponding value of $Y$. The magnitude of the partial regression coefficient depends on the unit used for each variable. It does not tell you anything about the relative importance of each variable.

When the purpose of multiple regression is understanding functional relationships, the important result is an equation containing standard partial regression coefficients, like this:

 $y'_{\text{exp}} = a+ b'_1x'_1+ b'_2x'_2 + b'_3x'_3 + \cdots$

Where $b'_1$ is the standard partial regression coefficient of $y$ on $X_1$. It is the number of standard deviations that $Y$ would change for every one standard deviation change in $X_1$, if all the other $X$ variables could be kept constant. The magnitude of the standard partial regression coefficients tells you something about the relative importance of different variables; $X$ variables with bigger standard partial regression coefficients have a stronger relationship with the $Y$ variable.

Linear Regression

A graphical representation of a best fit line for simple linear regression.

[ edit ]
Edit this content
Prev Concept
Multiple Regression Models
Evaluating Model Utility
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.