Statistics
Textbooks
Boundless Statistics
Other Hypothesis Tests
The Chi-Squared Test
Statistics Textbooks Boundless Statistics Other Hypothesis Tests The Chi-Squared Test
Statistics Textbooks Boundless Statistics Other Hypothesis Tests
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 9
Created by Boundless

Example: Test for Goodness of Fit

The Chi-square test for goodness of fit compares the expected and observed values to determine how well an experimenter's predictions fit the data.

Learning Objective

  • Support the use of Pearson's chi-squared test to measure goodness of fit


Key Points

    • Pearson's chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies, each squared and divided by the expectation.
    • If the value of the chi-square test statistic is greater than the value in the chi-square table, then the null hypothesis is rejected.
    • In this text, we examine a goodness of fit test as follows: for a population of employees, do the days for the highest number of absences occur with equal frequencies during a five day work week?

Term

  • null hypothesis

    A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.


Full Text

Pearson's chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

χ2=∑i=1n(Oi−Ei)2Ei\displaystyle{{ \chi }^{ 2 }=\sum _{ i=1 }^{ n }{ \dfrac { { \left( { O }_{ i }-{ E }_{ i } \right) }^{ 2 } }{ { E }_{ i } } }}χ​2​​=​i=1​∑​n​​​E​i​​​​(O​i​​−E​i​​)​2​​​​

where OiO_iO​i​​ is an observed frequency (i.e. count) for bin iii and EiE_iE​i​​ = an expected (theoretical) frequency for bin iii, asserted by the null hypothesis.

The expected frequency is calculated by:

Ei=[F(Yu)−F(Yl)]⋅NE_i = [F(Y_u)-F(Y_l)] \cdot NE​i​​=[F(Y​u​​)−F(Y​l​​)]⋅N

where FFF is the cumulative distribution function for the distribution being tested, YuY_uY​u​​ is the upper limit for class iii, YlY_lY​l​​ is the lower limit for class iii, and NNN is the sample size.

Example

Employers want to know which days of the week employees are absent in a five day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week did they have the highest number of employee absences. The results were distributed as follows: 

  • Monday: 15
  • Tuesday: 12
  • Wednesday: 9
  • Thursday: 9
  • Friday: 15

Solution

The null and alternate hypotheses are:

H0H_0H​0​​: The absent days occur with equal frequencies—that is, they fit a uniform distribution.

HaH_aH​a​​: The absent days occur with unequal frequencies—that is, they do not fit a uniform distribution.

If the absent days occur with equal frequencies then, out of 606060 absent days (the total in the sample: 15+12+9+9+15=6015 + 12 + 9 + 9 + 15 = 6015+12+9+9+15=60), there would be 121212 absences on Monday, 121212 on Tuesday, 121212 on Wednesday, 121212 on Thursday, and 121212 on Friday. These numbers are the expected (EEE) values. The values in the table are the observed (OOO) values or data.

Calculate the χ2\chi^2χ​2​​ test statistic. Make a chart with the following column headings and fill in the cells:

  • Expected (EEE) values (121212, 121212, 121212, 121212, 121212)
  • Observed (OOO) values (151515, 121212, 999, 999, 151515)
  • (O−E)\left( O-E \right)(O−E)
  • (O−E)2{ \left( O-E \right) }^{ 2 }(O−E)​2​​
  • (O−E)2E\dfrac { { \left( O-E \right) }^{ 2 } }{ E }​E​​(O−E)​2​​​​

Now add (sum) the values of the last column. Verify that this sum is 333. This is the χ2\chi^2χ​2​​ test statistic.

To find the ppp-value, calculate PPP(χ2>3\chi^2>3χ​2​​>3). This test is right-tailed. (p=0.5578p=0.5578p=0.5578)

The degrees of freedom are one fewer than the number of cells: df=5−1=4df = 5-1 = 4df=5−1=4.

Conclusion

The decision is to not reject the null hypothesis. At a 5%5\%5% level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.

[ edit ]
Edit this content
Prev Concept
Inferences of Correlation and Regression
Example: Test for Independence
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.