Pearson's correlation coefficient

(noun)

a measure of the linear correlation (dependence) between two variables $X$ and $Y$, giving a value between $+1$ and $-1$ inclusive, where 1 is total positive correlation, 0 is no correlation, and $-1$ is negative correlation

Examples of Pearson's correlation coefficient in the following topics:

Coefficient of Correlation
- The most common coefficient of correlation is known as the Pearson product-moment correlation coefficient, or Pearson's $r$.
- Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations.
- Pearson's correlation coefficient when applied to a population is commonly represented by the Greek letter $\rho$ (rho) and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient.
- Pearson's correlation coefficient when applied to a sample is commonly represented by the letter $r$ and may be referred to as the sample correlation coefficient or the sample Pearson correlation coefficient.
- This fact holds for both the population and sample Pearson correlation coefficients.
Rank Correlation
- It is common to regard these rank correlation coefficients as alternatives to Pearson's coefficient, used either to reduce the amount of calculation or to make the coefficient less sensitive to non-normality in distributions.
- However, this view has little mathematical basis, as rank correlation coefficients measure a different type of relationship than the Pearson product-moment correlation coefficient.
- In the same way, if $y$ always decreases when $x$ increases, the rank correlation coefficients will be $-1$ while the Pearson product-moment correlation coefficient may or may not be close to $-1$.
- This graph shows a Spearman rank correlation of 1 and a Pearson correlation coefficient of 0.88.
- In contrast, this does not give a perfect Pearson correlation.
Values of the Pearson Correlation
- Give the symbols for Pearson's correlation in the sample and in the population
- The Pearson product-moment correlation coefficient is a measure of the strength of the linear relationship between two variables.
- It is referred to as Pearson's correlation or simply as the correlation coefficient.
- The symbol for Pearson's correlation is "$\rho$" when it is measured in the population and "r" when it is measured in a sample.
- Pearson's r can range from -1 to 1.
Hypothesis Tests with the Pearson Correlation
- Pearson's correlation coefficient, $r$, tells us about the strength of the linear relationship between $x$ and $y$ points on a regression plot.
- We decide this based on the sample correlation coefficient $r$ and the sample size $n$.
- If the test concludes that the correlation coefficient is significantly different from 0, we say that the correlation coefficient is "significant."
- If the test concludes that the correlation coefficient is not significantly different from 0 (it is close to 0), we say that correlation coefficient is "not significant. "
- Use a hypothesis test in order to determine the significance of Pearson's correlation coefficient.
Other Types of Correlation Coefficients
- Other types of correlation coefficients include intraclass correlation and the concordance correlation coefficient.
- For example, in a paired data set where each "pair" is a single measurement made for each of two units (e.g., weighing each twin in a pair of identical twins) rather than two different measurements for a single unit (e.g., measuring height and weight for each individual), the ICC is a more natural measure of association than Pearson's correlation.
- Thus, if we are correlating $X$ and $Y$, where, say, $Y=2X+1$, the Pearson correlation between $X$ and $Y$ is 1: a perfect correlation.
- Whereas Pearson's correlation coefficient is immune to whether the biased or unbiased version for estimation of the variance is used, the concordance correlation coefficient is not.
- Distinguish the intraclass and concordance correlation coefficients from previously discussed correlation coefficients.
The Correlation Coefficient r
- Use the correlation coefficient as another indicator (besides the scatterplot) of the strength of the relationship between x and y.
- The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is a numerical measure of the strength of association between the independent variable x and the dependent variable y.
- If r = 1, there is perfect positive correlation.
- We say "correlation does not imply causation."
- The correlation coefficient r is the bottom item in the output screens for the LinRegTTest on the TI-83, TI-83+, or TI-84+ calculator (see previous section for instructions).
Properties of Pearson's r
- State the relationship between the correlation of Y with X and the correlation of X with Y
- A basic property of Pearson's r is that its possible range is from -1 to 1.
- Pearson's correlation is symmetric in the sense that the correlation of X with Y is the same as the correlation of Y with X.
- For example, the correlation of Weight with Height is the same as the correlation of Height with Weight.
- A critical property of Pearson's r is that it is unaffected by linear transformations.
Statistical Literacy
- he graph below showing the relationship between age and sleep is based on a graph that appears on this web page (http://www.shmoop.com/basic-statistics-probability/scatter-plots-correlation-examples.html).
- Why might Pearson's correlation not be a good way to describe the relationship?
- Pearson's correlation measures the strength of the linear relationship between two variables.
Randomization Tests: Association (Pearson's r)
- A significance test for Pearson's r is described in the section inferential statistics for b and r.
- The approach is to consider the X variable fixed and compare the correlation obtained in the actual data to the correlations that could be obtained by rearranging the Y variable.
- For the data shown in Table 1, the correlation between X and Y is 0.385.
- There is only one arrangement of Y that would produce a higher correlation.
- Therefore, there are two arrangements of Y that lead to correlations as high or higher than the actual data.
Confidence Intervals