Statistics
Textbooks
Boundless Statistics
Other Hypothesis Tests
The Chi-Squared Test
Statistics Textbooks Boundless Statistics Other Hypothesis Tests The Chi-Squared Test
Statistics Textbooks Boundless Statistics Other Hypothesis Tests
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 9
Created by Boundless

Example: Test for Goodness of Fit

The Chi-square test for goodness of fit compares the expected and observed values to determine how well an experimenter's predictions fit the data.

Learning Objective

  • Support the use of Pearson's chi-squared test to measure goodness of fit


Key Points

    • Pearson's chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies, each squared and divided by the expectation.
    • If the value of the chi-square test statistic is greater than the value in the chi-square table, then the null hypothesis is rejected.
    • In this text, we examine a goodness of fit test as follows: for a population of employees, do the days for the highest number of absences occur with equal frequencies during a five day work week?

Term

  • null hypothesis

    A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.


Full Text

Pearson's chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

$\displaystyle{{ \chi }^{ 2 }=\sum _{ i=1 }^{ n }{ \dfrac { { \left( { O }_{ i }-{ E }_{ i } \right) }^{ 2 } }{ { E }_{ i } } }}$

where $O_i$ is an observed frequency (i.e. count) for bin $i$ and $E_i$ = an expected (theoretical) frequency for bin $i$, asserted by the null hypothesis.

The expected frequency is calculated by:

$E_i = [F(Y_u)-F(Y_l)] \cdot N$

where $F$ is the cumulative distribution function for the distribution being tested, $Y_u$ is the upper limit for class $i$, $Y_l$ is the lower limit for class $i$, and $N$ is the sample size.

Example

Employers want to know which days of the week employees are absent in a five day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week did they have the highest number of employee absences. The results were distributed as follows: 

  • Monday: 15
  • Tuesday: 12
  • Wednesday: 9
  • Thursday: 9
  • Friday: 15

Solution

The null and alternate hypotheses are:

$H_0$: The absent days occur with equal frequencies—that is, they fit a uniform distribution.

$H_a$: The absent days occur with unequal frequencies—that is, they do not fit a uniform distribution.

If the absent days occur with equal frequencies then, out of $60$ absent days (the total in the sample: $15 + 12 + 9 + 9 + 15 = 60$), there would be $12$ absences on Monday, $12$ on Tuesday, $12$ on Wednesday, $12$ on Thursday, and $12$ on Friday. These numbers are the expected ($E$) values. The values in the table are the observed ($O$) values or data.

Calculate the $\chi^2$ test statistic. Make a chart with the following column headings and fill in the cells:

  • Expected ($E$) values ($12$, $12$, $12$, $12$, $12$)
  • Observed ($O$) values ($15$, $12$, $9$, $9$, $15$)
  • $\left( O-E \right)$
  • ${ \left( O-E \right) }^{ 2 }$
  • $\dfrac { { \left( O-E \right) }^{ 2 } }{ E }$

Now add (sum) the values of the last column. Verify that this sum is $3$. This is the $\chi^2$ test statistic.

To find the $p$-value, calculate $P$($\chi^2>3$). This test is right-tailed. ($p=0.5578$)

The degrees of freedom are one fewer than the number of cells: $df = 5-1 = 4$.

Conclusion

The decision is to not reject the null hypothesis. At a $5\%$ level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.

[ edit ]
Edit this content
Prev Concept
Inferences of Correlation and Regression
Example: Test for Independence
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.