Statistics
Textbooks
Boundless Statistics
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics

Chapter 11

Correlation and Regression

Book Version 1
By Boundless
Boundless Statistics
Statistics
by Boundless
View the full table of contents
Section 1
Correlation
Thumbnail
An Intuitive Approach to Relationships

Correlation refers to any of a broad class of statistical relationships involving dependence.

Thumbnail
Scatter Diagram

A scatter diagram is a type of mathematical diagram using Cartesian coordinates to display values for two variables in a set of data.

Thumbnail
Coefficient of Correlation

The correlation coefficient is a measure of the linear dependence between two variables $X$ and $Y$, giving a value between $+1$ and $-1$.

Coefficient of Determination

The coefficient of determination provides a measure of how well observed outcomes are replicated by a model.

Thumbnail
Line of Best Fit

The trend line (line of best fit) is a line that can be drawn on a scatter diagram representing a trend in the data.

Other Types of Correlation Coefficients

Other types of correlation coefficients include intraclass correlation and the concordance correlation coefficient.

Thumbnail
Variation and Prediction Intervals

A prediction interval is an estimate of an interval in which future observations will fall with a certain probability given what has already been observed.

Thumbnail
Rank Correlation

A rank correlation is a statistic used to measure the relationship between rankings of ordinal variables or different rankings of the same variable.

Section 2
More About Correlation
Thumbnail
Ecological Fallacy

An ecological fallacy is an interpretation of statistical data where inferences about individuals are deduced from inferences about the group as a whole.

Thumbnail
Correlation is Not Causation

The conventional dictum "correlation does not imply causation" means that correlation cannot be used to infer a causal relationship between variables.

Section 3
Regression
Thumbnail
Predictions and Probabilistic Models

Regression models are often used to predict a response variable $y$ from an explanatory variable $x$.

Thumbnail
A Graph of Averages

A graph of averages and the least-square regression line are both good ways to summarize the data in a scatterplot.

Thumbnail
The Regression Method

The regression method utilizes the average from known data to make predictions about new data.

Thumbnail
The Regression Fallacy

The regression fallacy fails to account for natural fluctuations and rather ascribes cause where none exists.

Section 4
The Regression Line
Slope and Intercept

In the regression line equation the constant $m$ is the slope of the line and $b$ is the $y$-intercept.

Thumbnail
Two Regression Lines

ANCOVA can be used to compare regression lines by testing the effect of a categorial value on a dependent variable, controlling the continuous covariate.

Thumbnail
Least-Squares Regression

The criteria for determining the least squares regression line is that the sum of the squared errors is made as small as possible.

Thumbnail
Model Assumptions

Standard linear regression models with standard estimation techniques make a number of assumptions.

Making Inferences About the Slope

The slope of the best fit line tells us how the dependent variable $y$ changes for every one unit increase in the independent variable $x$, on average.

Thumbnail
Regression Toward the Mean: Estimation and Prediction

Regression toward the mean says that if a variable is extreme on its 1st measurement, it will tend to be closer to the average on its 2nd.

Section 5
R.M.S. Error for Regression
Computing R.M.S. Error

RMS error measures the differences between values predicted by a model or an estimator and the values actually observed.

Thumbnail
Plotting the Residuals

The residual plot illustrates how far away each of the values on the graph is from the expected value (the value on the line).

Thumbnail
Homogeneity and Heterogeneity

By drawing vertical strips on a scatter plot and analyzing the spread of the resulting new data sets, we are able to judge degree of homoscedasticity.

Section 6
Multiple Regression
Thumbnail
Multiple Regression Models

Multiple regression is used to find an equation that best predicts the $Y$ variable as a linear function of the multiple $X$ variables.

Thumbnail
Estimating and Making Inferences About the Slope

The purpose of a multiple regression is to find an equation that best predicts the $Y$ variable as a linear function of the $X$ variables.

Thumbnail
Evaluating Model Utility

The results of multiple regression should be viewed with caution.

Thumbnail
Using the Model for Estimation and Prediction

Standard multiple regression involves several independent variables predicting the dependent variable.

Thumbnail
Interaction Models

In regression analysis, an interaction may arise when considering the relationship among three or more variables.

Thumbnail
Polynomial Regression

The goal of polynomial regression is to model a non-linear relationship between the independent and dependent variables.

Thumbnail
Qualitative Variable Models

Dummy, or qualitative variables, often act as independent variables in regression and affect the results of the dependent variables.

Thumbnail
Models with Both Quantitative and Qualitative Variables

A regression model that contains a mixture of quantitative and qualitative variables is called an Analysis of Covariance (ANCOVA) model.

Thumbnail
Comparing Nested Models

Multilevel (nested) models are appropriate for research designs where data for participants are organized at more than one level.

Thumbnail
Stepwise Regression

Stepwise regression is a method of regression modeling in which the choice of predictive variables is carried out by an automatic procedure.

Thumbnail
Checking the Model and Assumptions

There are a number of assumptions that must be made when using multiple regression models.

Thumbnail
Some Pitfalls: Estimability, Multicollinearity, and Extrapolation

Some problems with multiple regression include multicollinearity, variable selection, and improper extrapolation assumptions.

You are in this book
Boundless Statistics by Boundless
Previous Chapter
Chapter 10
Continuous Random Variables
  • The Normal Curve
  • Normal Approximation
  • Measurement Error
  • Expected Value and Standard Error
  • Normal Approximation for Probability Histograms
Current Chapter
Chapter 11
Correlation and Regression
  • Correlation
  • More About Correlation
  • Regression
  • The Regression Line
  • R.M.S. Error for Regression
and 1 more sections...
Next Chapter
Chapter 12
Estimation and Hypothesis Testing
  • Estimation
  • Statistical Power
  • Comparing More than Two Means
  • Confidence Intervals
  • Hypothesis Testing: One Sample
and 5 more sections...
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.