Creating a Hypothesis Test

Creating a hypothesis test generally follows a five-step procedure.

Learning Objective

Design a hypothesis test utilizing the five steps listed in this text.

Key Points

The first step is to set up or assume a null hypothesis.
The second step is to decide on an appropriate level of significance for assessing results.
The third step is to decide between a one-tailed or a two-tailed statistical test.
The fourth step is to interpret your results -- namely, your $p$-value and observed test statistics.
The final step is to write a report summarizing the statistical significance of your results.

Term

null hypothesis
A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

Full Text

The creation of a hypothesis test generally follows a five-step procedure as detailed below:

1. Set up or assume a statistical null hypothesis ($H_0$). Setting up a null hypothesis helps clarify the aim of the research. Such a hypothesis can also be assumed, given that null hypotheses, in general, are nil hypotheses and can be easily "reconstructed. " Examples of null hypotheses include:

$H_0$: Given our sample results, we will be unable to infer a significant correlation between the dependent and independent research variables.
$H_0$: It will not be possible to infer any statistically significant mean differences between the treatment and the control groups.
$H_0$: We will not be able to infer that this variable's distribution significantly departs from normality.

2. Decide on an appropriate level of significance for assessing results. Conventional levels are 5% ($\text{sig}<0.05$, meaning that results have a probability under the null hypothesis of less than 1 time in 20) or 1% ($\text{sig}<0.01$, meaning that results have a probability under the null hypothesis of less than 1 time in 100). However, the level of significance can be any "threshold" the researcher considers appropriate for the intended research (thus, it could be 0.02, 0.001, 0.0001, etc). If required, label such level of significance as "significance" or "sig" (i.e., $\text{sig}<0.05$). Avoid labeling it as "$p$" (so not to confuse it with $p$-values) or as "alpha" or "$\alpha$" (so not to confuse it with alpha tolerance errors).

3. Decide between a one-tailed or a two-tailed statistical test. A one-tailed test assesses whether the observed results are either significantly higher or smaller than the null hypothesis, but not both. Thus, one-tailed tests are appropriate when testing that results will only be higher or smaller than null results, or when the only interest is on interventions which will result in higher or smaller outputs. A two-tailed test, on the other hand, assesses both possibilities at once. It achieves so by dividing the total level of significance between both tails, which also implies that it is more difficult to get significant results than with a one-tailed test. Thus, two-tailed tests are appropriate when the direction of the results is not known, or when the researcher wants to check both possibilities in order to prevent making mistakes.

Two-Tailed Statistical Test

This image shows a graph representation of a two-tailed hypothesis test.

4. Interpret results:

Obtain and report the probability of the data. It is recommended to use the exact probability of the data, that is the '$p$-value' (e.g., $p=0.011$, or $p=0.51$). This exact probability is normally provided together with the pertinent statistic test ($z$, $t$, $F$…).
$p$-values can be interpreted as the probability of getting the observed or more extreme results under the null hypothesis (e.g., $p=0.033$ means that 3.3 times in 100, or 1 time in 33, we will obtain the same or more extreme results as normal [or random] fluctuation under the null).
$p$-values are considered statistically significant if they are equal to or smaller than the chosen significance level. This is the actual test of significance, as it interprets those $p$-values falling beyond the threshold as "rare" enough as to deserve attention.
If results are accepted as statistically significant, it can be inferred that the null hypothesis is not explanatory enough for the observed data.

5. Write Up the Report:

All test statistics and associated exact $p$-values can be reported as descriptive statistics, independently of whether they are statistically significant or not.
Significant results can be reported in the line of "either an exceptionally rare chance has occurred, or the theory of random distribution is not true. "
Significant results can also be reported in the line of "without the treatment I administered, experimental results as extreme as the ones I obtained would occur only about 3 times in 1000. Therefore, I conclude that my treatment has a definite effect.". Further, "this correlation is so extreme that it would only occur about 1 time in 100 ($p=0.01$). Thus, it can be inferred that there is a significant correlation between these variables.

[ edit ]

Prev Concept

Directional Hypotheses and One-Tailed Tests

Testing a Single Proportion

Next Concept