Statistics
Textbooks
Boundless Statistics
Estimation and Hypothesis Testing
Hypothesis Testing: One Sample
Statistics Textbooks Boundless Statistics Estimation and Hypothesis Testing Hypothesis Testing: One Sample
Statistics Textbooks Boundless Statistics Estimation and Hypothesis Testing
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 7
Created by Boundless

Tests of Significance

Tests of significance are a statistical technology used for ascertaining the likelihood of empirical data, and (from there) for inferring a real effect.

Learning Objective

  • Examine the idea of statistical significance and the fundamentals behind the corresponding tests.


Key Points

    • In relation to Fisher, statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance.
    • In statistical testing, a result is deemed statistically significant if it is so extreme that such a result would be expected to arise simply by chance only in rare circumstances.
    • Statistical significance refers to two separate notions: the $p$-value and the Type I error rate $\alpha$.
    • A typical test of significance comprises two related elements: the calculation of the probability of the data and an assessment of the statistical significance of that probability.

Terms

  • null hypothesis

    A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

  • statistical significance

    A measure of how unlikely it is that a result has occurred by chance.


Full Text

Tests of significance are a statistical technology used for ascertaining the likelihood of empirical data, and, from there, for inferring a real effect, such as a correlation between variables or the effectiveness of a new treatment. Beginning circa 1925, Sir Ronald Fisher—an English statistician, evolutionary biologist, geneticist, and eugenicist (shown in )—standardized the interpretation of statistical significance, and was the main driving force behind the popularity of tests of significance in empirical research, especially in the social and behavioral sciences.

Sir Ronald Fisher

Sir Ronald Fisher was an English statistician, evolutionary biologist, geneticist, and eugenicist who standardized the interpretation of statistical significance (starting around 1925), and was the main driving force behind the popularity of tests of significance in empirical research, especially in the social and behavioral sciences.

Statistical significance refers to two separate notions:

  1. the $p$-value, (the probability that the observed data would occur by chance in a given true null hypothesis); or
  2. the Type I error rate $\alpha$ (false positive rate) of a statistical hypothesis test (the probability of incorrectly rejecting a given null hypothesis in favor of a second alternative hypothesis).

In relation to Fisher, statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance. The fundamental challenge is that any partial picture of a given hypothesis, poll or question is subject to random error. In statistical testing, a result is deemed statistically significant if it is so extreme (without external variables which would influence the correlation results of the test) that such a result would be expected to arise simply by chance only in rare circumstances. Hence the result provides enough evidence to reject the hypothesis of "no effect. "

Reading Tests of Significance

A typical test of significance comprises two related elements:

  1. the calculation of the probability of the data, and
  2. an assessment of the statistical significance of that probability.

Probability of the Data

The probability of the data is normally reported using two related statistics:

  1. a test statistic ($z$, $t$, $F$…), and
  2. an associated probability ($p$, $^*$).

The information provided by the test statistic is of little immediate usability and can be ignored in most cases. The associated probability, on the other hand, tells how probable the test results are and forms the basis for assessing statistical significance.

Statistical Significance

The statistical significance of the results depends on criteria set up by the researcher beforehand. A result is deemed statistically significant if the probability of the data is small enough, conventionally if it is smaller than 5% ($\text{sig} \leq 0.05$). However, conventional thresholds for significance may vary depending on disciplines and researchers. For example, health sciences commonly settle for 10% ($\text{sig} \leq 0.10$), while particular researchers may settle for more stringent conventional levels, such as 1% ($\text{sig} \leq 0.01$). In any case, p-values ($p$, $^*$) larger than the selected threshold are considered non-significant and are typically ignored from further discussion. $P$-values smaller than, or equal to, the threshold are considered statistically significant and interpreted accordingly. A statistically significant result normally leads to an appropriate inference of real effects, unless there are suspicions that such results may be anomalous. Notice that the criteria used for assessing statistical significance may not be made explicit in a research article when the researcher is using conventional assessment criteria.

As an example, consider the following test statistics: 

$[z=1.96, p=0.025]$

$[F = 13.140, p<0.01]$

$[r = 0.60^*]$

In this example, the test statistics are $z$ (normality test), $F$ (equality of variance test), and $r$ (correlation). Each $p$-value indicates, with more or less precision, the probability of its test statistic under the corresponding null hypothesis. Assuming a conventional 5% level of significance ($\text{sig} \leq 0.05$), all tests are, thus, statistically significant. We can thus infer that we have measured a real effect rather than a random fluctuation in the data. When interpreting the results, the correlation statistic provides information which is directly usable. We could thus infer a medium-to-high correlation between two variables. The test statistics $z$ and $F$, on the other hand, do not provide immediate useful information, and any further interpretation needs of descriptive statistics. For example, skewness and kurtosis are necessary for interpreting non-normality $z$, and group means and variances are necessary for describing group differences $F$.

[ edit ]
Edit this content
Prev Concept
Estimating a Population Variance
Elements of a Hypothesis Test
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.