Statistics
Textbooks
Boundless Statistics
Estimation and Hypothesis Testing
Comparing More than Two Means
Statistics Textbooks Boundless Statistics Estimation and Hypothesis Testing Comparing More than Two Means
Statistics Textbooks Boundless Statistics Estimation and Hypothesis Testing
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 7
Created by Boundless

Elements of a Designed Study

The problem of comparing more than two means results from the increase in Type I error that occurs when statistical tests are used repeatedly.

Learning Objective

  • Discuss the increasing Type I error that accompanies comparisons of more than two means and the various methods of correcting this error.


Key Points

    • Unless the tests are perfectly dependent, the familywide error rate increases as the number of comparisons increases.
    • Multiple testing correction refers to re-calculating probabilities obtained from a statistical test which was repeated multiple times.
    • In order to retain a prescribed familywise error rate α\alphaα in an analysis involving more than one comparison, the error rate for each comparison must be more stringent than α\alphaα.
    • The most conservative, but free of independency and distribution assumptions method, way of controlling the familywise error rate is known as the Bonferroni correction.
    • Multiple comparison procedures are commonly used in an analysis of variance after obtaining a significant omnibus test result, like the ANOVA FFF-test.

Terms

  • ANOVA

    Analysis of variance—a collection of statistical models used to analyze the differences between group means and their associated procedures (such as "variation" among and between groups).

  • Boole's inequality

    a probability theory stating that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events

  • Bonferroni correction

    a method used to counteract the problem of multiple comparisons; considered the simplest and most conservative method to control the familywise error rate


Full Text

For hypothesis testing, the problem of comparing more than two means results from the increase in Type I error that occurs when statistical tests are used repeatedly. If nnn independent comparisons are performed, the experiment-wide significance level α¯\bar { \alpha }​α​¯​​, also termed FWER for familywise error rate, is given by:

α¯=1−(1−αper comparison)n\bar{\alpha} = 1-(1-\alpha_{\text{per comparison}})^n​α​¯​​=1−(1−α​per comparison​​)​n​​

Hence, unless the tests are perfectly dependent, α¯\bar { \alpha }​α​¯​​ increases as the number of comparisons increases. If we do not assume that the comparisons are independent, then we can still say:

α¯≤n⋅αper comparison\bar { \alpha } \le n\cdot { \alpha }_ {\text{per comparison}}​α​¯​​≤n⋅α​per comparison​​.

There are different ways to assure that the familywise error rate is at most α¯\bar { \alpha }​α​¯​​. The most conservative, but free of independency and distribution assumptions method, is known as the Bonferroni correction αper comparison=α¯n{\alpha }_ {\text{per comparison}}=\frac { \bar { \alpha } }{ n }α​per comparison​​=​n​​​α​¯​​​​. A more sensitive correction can be obtained by solving the equation for the familywise error rate of independent comparisons for αper comparison{\alpha }_ {\text{per comparison}}α​per comparison​​. 

This yields αper comparison=1−(1−α¯)1n{\alpha }_ {\text{per comparison}}=1-{ \left( 1-\bar { \alpha } \right) }^{ \frac { 1 }{ n } }α​per comparison​​=1−(1−​α​¯​​)​​n​​1​​​​, which is known as the Šidák correction. Another procedure is the Holm–Bonferroni method, which uniformly delivers more power than the simple Bonferroni correction by testing only the most extreme ppp-value (i=1i=1i=1) against the strictest criterion, and the others (i>1i>1i>1) against progressively less strict criteria.

Methods

Multiple testing correction refers to re-calculating probabilities obtained from a statistical test which was repeated multiple times. In order to retain a prescribed familywise error rate α\alphaα in an analysis involving more than one comparison, the error rate for each comparison must be more stringent than α\alphaα. Boole's inequality implies that if each test is performed to have type I error rate αn\frac{\alpha}{n}​n​​α​​, the total error rate will not exceed α\alphaα. This is called the Bonferroni correction and is one of the most commonly used approaches for multiple comparisons.

Because simple techniques such as the Bonferroni method can be too conservative, there has been a great deal of attention paid to developing better techniques, such that the overall rate of false positives can be maintained without inflating the rate of false negatives unnecessarily. Such methods can be divided into general categories:

  • Methods where total alpha can be proved to never exceed 0.05 (or some other chosen value) under any conditions. These methods provide "strong" control against Type I error, in all conditions including a partially correct null hypothesis.
  • Methods where total alpha can be proved not to exceed 0.05 except under certain defined conditions.
  • Methods which rely on an omnibus test before proceeding to multiple comparisons. Typically these methods require a significant ANOVA/Tukey's range test before proceeding to multiple comparisons. These methods have "weak" control of Type I error.
  • Empirical methods, which control the proportion of Type I errors adaptively, utilizing correlation and distribution characteristics of the observed data.

Post-Hoc Testing of ANOVA

Multiple comparison procedures are commonly used in an analysis of variance after obtaining a significant omnibus test result, like the ANOVA FFF-test. The significant ANOVA result suggests rejecting the global null hypothesis H0H_0H​0​​ that the means are the same across the groups being compared. Multiple comparison procedures are then used to determine which means differ. In a one-way ANOVA involving KKK group means, there are K(K−1)2\frac{K(K-1)}{2}​2​​K(K−1)​​ pairwise comparisons.

[ edit ]
Edit this content
Prev Concept
Statistical Power
Randomized Design: Single-Factor
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.