Normal Distributions
NORMAL DISTRIBUTIONS
Figure 1
In studies of public health, information is frequently collected for variables that can be measured on a continuous scale in nature. Examples of such variables include age, weight, and blood pressure. The shape of the distribution associated with these variables is useful to describe the frequency of values across different ranges. More specifically, distributions allow for the probability of obtaining a specific value of a variable to be calculated, while providing estimates of the average, and range, of possible values. The normal distribution is the most widely used distribution to describe continuous variables. It is also frequently referred to as the Gaussian distribution, after the well-known German mathematician Karl Friedrich Gauss (1777–1855).
Normal distributions are a family of distributions characterized by the same general shape. These distributions are symmetrical, with the measured values of the variable more concentrated in the middle than in the tails. They are frequently referred to as "bell-shaped." The area under the curve of a normal distribution represents the sum of the probabilities of obtaining every possible value for a variable. In other words, the total area under a normal curve is equal to one. The shape of the normal distribution represents specified mathematically in terms of only two parameters: the mean (µ), and the standard deviation ([.sigma]). The standard deviation specifies the amount of dispersion around the mean, whereas the mean is the
Several biological variables are normally distributed (e.g., blood pressure, serum cholesterol, height, and weight). The normal curve can be used to estimate probabilities associated with these variables. For example, in a population where the birth weight of infants is normally distributed with a mean of 7.2 pounds and a standard deviation of2.1 pounds, one might wish to find the probability a randomly chosen infant will have a birth weight of less than 3 pounds. Such information might help in planning for future obstetric services.
Since the normal distribution can have an infinite number of possible values for its mean and standard deviation, it is impossible to calculate the area for each and every curve. Instead, probabilities are calculated for a single curve where the mean is zero and the standard deviation is one. This curve is referred to as a standard normal distribution (Z). A random variable (X) that is normally distributed with mean (µ) and standard deviation ([.sigma]) can be easily transformed to the standard normal distribution by the formula Z = (X−µ)/[.sigma].
The normal distribution is important to statistical work because most hypothesis tests that are used assume that the random variable being considered has an underlying normal distribution. Fortunately, these tests work very well even if the distribution of the variable is only approximately normal. Examples of such tests include those based on the t, F, or chi-square statistics. If the variable is not normal, alternative nonparametric tests should be considered; however, such tests are inconvenient because they typically are less powerful and flexible in terms of types of conclusions that can be drawn. Alternatively, mathematical theory (e.g., the central limit theorem) has proven that normal distribution–based hypothesis testing can be performed if a large enough number of samples are taken. This latter option is based on an important principle that is largely responsible for the popularity of tests based on the normal function—that if the size of the samples is large enough, the shape of the sampling distribution approaches normal shape even if the distribution of the variable in question is not normal.
PAUL J. VILLENEUVE
(SEE ALSO: Chi-Square Test; Sampling; Statistics for Public Health)
