The Normal Distribution

The normal distribution is symmetric with scores more concentrated in the middle than in the tails.

Learning Objective

Recognize the normal distribution from its characteristics

Key Points

Physical quantities that are expected to be the sum of many independent processes (such as measurement errors) often have a distribution very close to normal.
The simplest case of normal distribution, known as the Standard Normal Distribution, has expected value zero and variance one.
If the mean and standard deviation are known, then one essentially knows as much as if he or she had access to every point in the data set.
The empirical rule is a handy quick estimate of the spread of the data given the mean and standard deviation of a data set that follows normal distribution.
The normal distribution is the most used statistical distribution, since normality arises naturally in many physical, biological, and social measurement situations.

Terms

empirical rule
That a normal distribution has 68% of its observations within one standard deviation of the mean, 95% within two, and 99.7% within three.
entropy
A measure which quantifies the expected value of the information contained in a message.
cumulant
Any of a set of parameters of a one-dimensional probability distribution of a certain form.

Full Text

Normal distributions are a family of distributions all having the same general shape. They are symmetric, with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped.

The normal distribution is a continuous probability distribution, defined by the formula:

$\displaystyle f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{(x-\mu)^2}{2\sigma^2}}$

The parameter $\mu$ in this formula is the mean or expectation of the distribution (and also its median and mode). The parameter $\sigma$ is its standard deviation; its variance is therefore $\sigma^2$. If $\mu = 0$ and $\sigma = 1$, the distribution is called the standard normal distribution or the unit normal distribution, and a random variable with that distribution is a standard normal deviate.

Normal distributions are extremely important in statistics, and are often used in the natural and social sciences for real-valued random variables whose distributions are not known. One reason for their popularity is the central limit theorem, which states that (under mild conditions) the mean of a large number of random variables independently drawn from the same distribution is distributed approximately normally, irrespective of the form of the original distribution. Thus, physical quantities expected to be the sum of many independent processes (such as measurement errors) often have a distribution very close to normal. Another reason is that a large number of results and methods can be derived analytically, in explicit form, when the relevant variables are normally distributed.

The normal distribution is the only absolutely continuous distribution whose cumulants, other than the mean and variance, are all zero. It is also the continuous distribution with the maximum entropy for a given mean and variance.

Standard Normal Distribution

The simplest case of normal distribution, known as the Standard Normal Distribution, has expected value zero and variance one. This is written as N (0, 1), and is described by this probability density function:

$\displaystyle \phi(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}$

The $\frac { 1 }{ \sqrt { 2\pi } }$ factor in this expression ensures that the total area under the curve $\phi(x)$ is equal to one. The $\frac{1}{2}$ in the exponent ensures that the distribution has unit variance (and therefore also unit standard deviation). This function is symmetric around $x=0$, where it attains its maximum value $\frac { 1 }{ \sqrt { 2\pi } }$; and has inflection points at $+1$ and $-1$.

Characteristics of the Normal Distribution

It is a continuous distribution.
It is symmetrical about the mean. Each half of the distribution is a mirror image of the other half.
It is asymptotic to the horizontal axis.
It is unimodal.
The area under the curve is 1.

The normal distribution carries with it assumptions and can be completely specified by two parameters: the mean and the standard deviation. This is written as $N(0, 1)$. If the mean and standard deviation are known, then one essentially knows as much as if he or she had access to every point in the data set.

The empirical rule is a handy quick estimate of the spread of the data given the mean and standard deviation of a data set that follows normal distribution. It states that:

68% of the data will fall within 1 standard deviation of the mean.
95% of the data will fall within 2 standard deviations of the mean.
Almost all (99.7% ) of the data will fall within 3 standard deviations of the mean.

The strengths of the normal distribution are that:

it is probably the most widely known and used of all distributions,
it has infinitely divisible probability distributions, and
it has strictly stable probability distributions.

The weakness of normal distributions is for reliability calculations. In this case, using the normal distribution starts at negative infinity. This case is able to result in negative values for some of the results.

Importance and Application

Many things are normally distributed, or very close to it. For example, height and intelligence are approximately normally distributed.
The normal distribution is easy to work with mathematically. In many practical cases, the methods developed using normal theory work quite well even when the distribution is not normal.
There is a very strong connection between the size of a sample $N$ and the extent to which a sampling distribution approaches the normal form. Many sampling distributions based on a large $N$ can be approximated by the normal distribution even though the population distribution itself is not normal.
The normal distribution is the most used statistical distribution, since normality arises naturally in many physical, biological, and social measurement situations.

In addition, normality is important in statistical inference. The normal distribution has applications in many areas of business administration. For example:

Modern portfolio theory commonly assumes that the returns of a diversified asset portfolio follow a normal distribution.
In human resource management, employee performance sometimes is considered to be normally distributed.

[ edit ]

Prev Concept

The Exponential Distribution

Graphing the Normal Distribution

Next Concept