Example: Test for Goodness of Fit

The Chi-square test for goodness of fit compares the expected and observed values to determine how well an experimenter's predictions fit the data.

Learning Objective

Support the use of Pearson's chi-squared test to measure goodness of fit

Key Points

Pearson's chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies, each squared and divided by the expectation.
If the value of the chi-square test statistic is greater than the value in the chi-square table, then the null hypothesis is rejected.
In this text, we examine a goodness of fit test as follows: for a population of employees, do the days for the highest number of absences occur with equal frequencies during a five day work week?

Term

null hypothesis
A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

Full Text

Pearson's chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

$\displaystyle{{ \chi }^{ 2 }=\sum _{ i=1 }^{ n }{ \dfrac { { \left( { O }_{ i }-{ E }_{ i } \right) }^{ 2 } }{ { E }_{ i } } }}$

where $O_i$ is an observed frequency (i.e. count) for bin $i$ and $E_i$ = an expected (theoretical) frequency for bin $i$, asserted by the null hypothesis.

The expected frequency is calculated by:

$E_i = [F(Y_u)-F(Y_l)] \cdot N$

where $F$ is the cumulative distribution function for the distribution being tested, $Y_u$ is the upper limit for class $i$, $Y_l$ is the lower limit for class $i$, and $N$ is the sample size.

Example

Employers want to know which days of the week employees are absent in a five day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week did they have the highest number of employee absences. The results were distributed as follows:

Monday: 15
Tuesday: 12
Wednesday: 9
Thursday: 9
Friday: 15

Solution

The null and alternate hypotheses are:

$H_0$: The absent days occur with equal frequencies—that is, they fit a uniform distribution.

$H_a$: The absent days occur with unequal frequencies—that is, they do not fit a uniform distribution.

If the absent days occur with equal frequencies then, out of $60$ absent days (the total in the sample: $15 + 12 + 9 + 9 + 15 = 60$), there would be $12$ absences on Monday, $12$ on Tuesday, $12$ on Wednesday, $12$ on Thursday, and $12$ on Friday. These numbers are the expected ($E$) values. The values in the table are the observed ($O$) values or data.

Calculate the $\chi^2$ test statistic. Make a chart with the following column headings and fill in the cells:

Expected ($E$) values ($12$, $12$, $12$, $12$, $12$)
Observed ($O$) values ($15$, $12$, $9$, $9$, $15$)
$\left( O-E \right)$
${ \left( O-E \right) }^{ 2 }$
$\dfrac { { \left( O-E \right) }^{ 2 } }{ E }$

Now add (sum) the values of the last column. Verify that this sum is $3$. This is the $\chi^2$ test statistic.

To find the $p$-value, calculate $P$($\chi^2>3$). This test is right-tailed. ($p=0.5578$)

The degrees of freedom are one fewer than the number of cells: $df = 5-1 = 4$.

Conclusion

The decision is to not reject the null hypothesis. At a $5\%$ level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.

[ edit ]

Prev Concept

Inferences of Correlation and Regression

Example: Test for Independence

Next Concept