A binomial distribution can be used to describe the number of times an event will occur in a group of patients, a series of clinical trials, or any other sequence of observations. This event is a binary variable: It either occurs or it doesn't. For example, when patients are treated with a new drug they are either cured or not; when a coin is flipped, the result is either a head or tail. The binary outcome associated with each event is typically referred to as either a "success" or a "failure." In general, a binomial distribution is used to characterize the number of successes over a series of observations (or trials), where each observation is referred to as a "Bernoulli trial."
In a series of n Bernoulli trials, the binomial distribution can be used to calculate the probability of obtaining k successful outcomes. If the variable X represents the total number of successes in n trials, it can only take on a value from 0 to n. The binomial distribution can be used to calculate the probability of obtaining k successes in n trials is calculated as follows:
where 0 less than or equal to p less than or equal to 1 is the probability of success, and n!= 1 × 2 × 3[.dotmath][.dotmath][.dotmath][.dotmath] (n−2)×(n−1)×n.
The above formula assumes that the experiment consists of n identical trials that are independent from one another, and that there are only two possible outcomes for each trial (success or failure). The probability of success (p) is also assumed to be the same in each of the trials.
To further illustrate the application of the above formula, if a drug was developed that cured 30 percent of all patients, and it was administered to ten patients, the probability that exactly four patients would be cured is:
Like other distributions, the binomial distribution can be described in terms of a mean and the spread, or variance, of values. The mean value of a binomial random variable X (i.e., the average number of successes in n trials) can be obtained by multiplying the number of trials by p (np). In the above example, the average number of persons cured in any group of 10 patients would thus be 3. The variance of a binomial distribution is np × (1−p). The variance is largest for p = 0.5, while it decreases as p approaches 0 or 1. Intuitively, this makes sense, since when p is very large or small nearly all the outcomes take on the same value. Returning to the example, a drug that cured every patient p would equal one, while for a drug that cured no one, p would equal zero. In contrast, if the drug was effective in curing only half of the population (p = 0.5) it would be more difficult to predict the outcome in any particular patient, and in this case the variability is relatively large.
In studies of public health, the binomial distribution is used when a researcher is interested in the occurrence of an event rather than in its magnitude. For instance, smoking cessation interventions may choose to focus on whether a smoker quit smoking altogether, rather than evaluate daily reductions in the number of cigarettes smoked. The binomial distribution plays an important role in statistics, as it is likely the most frequently used distribution to describe discrete data.
PAUL J. VILLENEUVE
(SEE ALSO: Statistics for Public Health)
Pagano, M., and Gauvreau, K. (2000). Priniciples of Biostatistics, 2nd edition. Pacific Grove, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of Biostatistics, 5th edition. Pacific Grove, CA: Duxbury Press.