Statistics
Textbooks
Boundless Statistics
Estimation and Hypothesis Testing
Comparing More than Two Means
Statistics Textbooks Boundless Statistics Estimation and Hypothesis Testing Comparing More than Two Means
Statistics Textbooks Boundless Statistics Estimation and Hypothesis Testing
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 7
Created by Boundless

Multiple Comparisons of Means

ANOVA is useful in the multiple comparisons of means due to its reduction in the Type I error rate.

Learning Objective

  • Explain the issues that arise when researchers aim to make a number of formal comparisons, and give examples of how these issues can be resolved.


Key Points

    • "Multiple comparisons" arise when a statistical analysis encompasses a number of formal comparisons, with the presumption that attention will focus on the strongest differences among all comparisons that are made.
    • As the number of comparisons increases, it becomes more likely that the groups being compared will appear to differ in terms of at least one attribute.
    • Doing multiple two-sample $t$-tests would result in an increased chance of committing a Type I error.

Terms

  • ANOVA

    Analysis of variance—a collection of statistical models used to analyze the differences between group means and their associated procedures (such as "variation" among and between groups).

  • null hypothesis

    A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

  • Type I error

    An error occurring when the null hypothesis (H0) is true, but is rejected.


Full Text

The multiple comparisons problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. Errors in inference, including confidence intervals that fail to include their corresponding population parameters or hypothesis tests that incorrectly reject the null hypothesis, are more likely to occur when one considers the set as a whole. Several statistical techniques have been developed to prevent this, allowing direct comparison of means significance levels for single and multiple comparisons. These techniques generally require a stronger level of observed evidence in order for an individual comparison to be deemed "significant," so as to compensate for the number of inferences being made.

The Problem

When researching, we typically refer to comparisons of two groups, such as a treatment group and a control group. "Multiple comparisons" arise when a statistical analysis encompasses a number of formal comparisons, with the presumption that attention will focus on the strongest differences among all comparisons that are made. Failure to compensate for multiple comparisons can have important real-world consequences

As the number of comparisons increases, it becomes more likely that the groups being compared will appear to differ in terms of at least one attribute. Our confidence that a result will generalize to independent data should generally be weaker if it is observed as part of an analysis that involves multiple comparisons, rather than an analysis that involves only a single comparison.

For example, if one test is performed at the 5% level, there is only a 5% chance of incorrectly rejecting the null hypothesis if the null hypothesis is true. However, for 100 tests where all null hypotheses are true, the expected number of incorrect rejections is 5. If the tests are independent, the probability of at least one incorrect rejection is 99.4%. These errors are called false positives, or Type I errors.

Techniques have been developed to control the false positive error rate associated with performing multiple statistical tests. Similarly, techniques have been developed to adjust confidence intervals so that the probability of at least one of the intervals not covering its target value is controlled.

Analysis of Variance (ANOVA) for Comparing Multiple Means

In order to compare the means of more than two samples coming from different treatment groups that are normally distributed with a common variance, an analysis of variance is often used. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal. Therefore, it generalizes the $t$-test to more than two groups. Doing multiple two-sample $t$-tests would result in an increased chance of committing a Type I error. For this reason, ANOVAs are useful in comparing (testing) three or more means (groups or variables) for statistical significance.

The following table summarizes the calculations that need to be done, which are explained below:

ANOVA Calculation Table

This table summarizes the calculations necessary in an ANOVA for comparing multiple means.

Letting $x_{ij}$ be the $j$th measurement in the $i$th sample (where $j=1, 2, \cdots n$), then:

$\displaystyle \text{Total SS} = \sum\limits_{i,j} (x_{ij} - \bar{x})^2 = \sum x_{ij}^2 - \frac{(\sum x_{ij})^2}{n} = \sum x_{ij}^2 -\text{CM}$

and the sum of the squares of the treatments is:

$\displaystyle \text{SST} = \sum n_i(\bar{x}_i-\bar{x})^2 = \sum\frac{T_i^2}{n_i} - \text{CM}$

where $T_i$ is the total of the observations in treatment $i$, $n_i$ is the number of observations in sample $i$ and CM is the correction of the mean:

$T_i= \sum\limits_j x_{ij}$

$\displaystyle \text{CM}= \frac{(\sum x_{ij})^2}{n}$

The sum of squares of the error SSE is given by:

$\text{SSE}= \text{Total SS} - \text{SST}$

and

$F = \frac{\text{MST}}{\text{MSE}}.$

Example

An example for the effect of breakfast on attention span (in minutes) for small children is summarized in the table below:

.

Breakfast and Children's Attention Span

This table summarizes the effect of breakfast on attention span (in minutes) for small children.

The hypothesis test would be:

$H_0: \mu_1 = \mu_2 = \mu_3$

versus:

$H_a: \mu_1 \neq \mu_2\ or\ \mu2\neq \mu_3\ or\ \mu_1\neq \mu_3$

The solution to the test can be seen in the figure below:

.

Excel Solution

This image shows the solution to our ANOVA example performed in Excel.

The test statistic $F$ is equal to 4.9326. The corresponding right-tail probability is 0.027, which means that if the significance level is 0.05, the test statistic would be in the rejection region, and therefore, the null-hypothesis would be rejected.

Hence, this indicates that the means are not equal (i.e., that sample values give sufficient evidence that not all means are the same). In terms of the example this means that breakfast (and its size) does have an effect on children's attention span.

[ edit ]
Edit this content
Prev Concept
Randomized Design: Single-Factor
Randomized Block Design
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.