The coefficient of determination (denoted
The Math
A data set will have observed values and modelled values, sometimes known as predicted values. The "variability" of the data set is measured through different sums of squares, such as:
- the total sum of squares (proportional to the sample variance);
- the regression sum of squares (also called the explained sum of squares); and
- the sum of squares of residuals, also called the residual sum of squares.
The most general definition of the coefficient of determination is:
where
Properties and Interpretation of
The coefficient of determination is actually the square of the correlation coefficient. It is is usually stated as a percent, rather than in decimal form. In context of data,
, when expressed as a percent, represents the percent of variation in the dependent variable that can be explained by variation in the independent variable using the regression (best fit) line. when expressed as a percent, represents the percent of variation in that is NOT explained by variation in using the regression line. This can be seen as the scattering of the observed data points about the regression line.
So
In many (but not all) instances where
Note that
- the independent variables are a cause of the changes in the dependent variable;
- omitted-variable bias exists;
- the correct regression was used;
- the most appropriate set of independent variables has been chosen;
- there is collinearity present in the data on the explanatory variables; or
- the model might be improved by using transformed versions of the existing set of independent variables.
Example
Consider the third exam/final exam example introduced in the previous section. The correlation coefficient is
The interpretation of