Sampling Distributions
Suppose that we draw all possible samples of size n from a given
population. Suppose further that we compute a
statistic (e.g., a mean, proportion, standard deviation) for each
sample. The probability
distribution of this statistic is called a sampling distribution. And
the standard deviation of this statistic is called the standard error.
Variability of a Sampling Distribution
The variability of a sampling distribution is measured by its
variance or its
standard deviation. The variability of a sampling
distribution depends on three factors:
- The way that the random sample is chosen.
If the population size is much larger than the sample size, then the sampling
distribution has roughly the same standard error, whether we sample
with or
without replacement. On the other hand, if the sample represents a
significant fraction (say, 1/20) of the population size, the standard error
will be meaningfully smaller, when we sample without replacement.
Sampling Distribution of the Mean
Suppose we draw all possible samples of size n from a population of size N.
Suppose further that we compute a mean score for each sample. In this way, we
create a sampling distribution of the mean.
We know the following about the sampling distribution of the mean.
The mean of the sampling distribution (μ_{x})
is equal to the mean of the population (μ).
And the standard error of the sampling distribution (σ_{x})
is determined by the standard deviation of the population (σ),
the population size (N), and the sample size (n). These relationships are shown in the
equations below:
μ_{x} = μ
σ_{x} = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
In the standard error formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite population correction or fpc.
When the population size is very large relative to the sample size, the fpc is approximately
equal to one; and the standard error formula can be approximated by:
σ_{x} = σ / sqrt(n).
You often see this "approximate" formula in introductory statistics texts. As a general rule, it is
safe to use the approximate formula when the sample size is no bigger than 1/20 of the
population size.
Sampling Distribution of the Proportion
In a population of size N, suppose that the probability of the occurrence
of an event (dubbed a "success") is P; and the probability of the event's
non-occurrence (dubbed a "failure") is Q. From this population, suppose that we
draw all possible samples of size n. And finally, within each sample,
suppose that we determine the proportion of successes p and failures q.
In this way, we create a sampling distribution of the proportion.
We find that the mean of the sampling distribution of the proportion (μ_{p})
is equal to the probability of success in the population (P). And the standard
error of the sampling distribution (σ_{p})
is determined by the standard deviation of the population (σ),
the population size, and the sample size. These relationships are shown in the
equations below:
μ_{p} = P
σ_{p} = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
σ_{p} = sqrt[ PQ/n ] * sqrt[ (N - n ) / (N - 1) ]
where σ = sqrt[ PQ ].
Like the formula for the standard error of the mean, the formula for the standard error of the
proportion uses the finite population correction, sqrt[ (N - n ) / (N - 1) ].
When the population size is very large relative to the sample size, the fpc is approximately
equal to one; and the standard error formula can be approximated by:
σ_{p} = sqrt[ PQ/n ]
You often see this "approximate" formula in introductory statistics texts. As a general rule, it is
safe to use the approximate formula when the sample size is no bigger than 1/20 of the
population size.
Central Limit Theorem
The central limit theorem states that the
sampling distribution of the mean of any
independent,
random variable will be normal or nearly normal,
if the sample size is large enough.
How large is "large enough"? The answer depends on two factors.
- The shape of the underlying population. The more closely the
original population resembles a normal distribution, the fewer sample points will
be required.
In practice, some statisticians say that a sample size of 30 is large enough
when the population distribution is roughly bell-shaped. Others recommend a sample
size of at least 40. But if the original population is distinctly not normal
(e.g., is badly skewed, has multiple peaks, and/or has outliers), researchers
like the sample size to be even larger.
How to Choose Between T-Distribution and Normal Distribution
The t distribution and the normal distribution can both
be used with statistics that have a bell-shaped distribution. This suggests that we might use either the t-distribution
or the normal distribution to analyze sampling distributions. Which should we choose?
Guidelines exist to help you make that choice. Some focus on the population standard deviation.
Other guidelines focus on sample size.
- If the sample size is small, use the t-distribution.
In practice, researchers employ a mix of the above guidelines. On this site, we use the normal
distribution when the population standard deviation is known and the sample size is large.
We might use either distribution when standard deviation is unknown and the sample size is very large.
We use the t-distribution when the sample size is small, unless the underlying
distribution is not normal. The t distribution should not be used with small samples from populations
that are not approximately normal.
Test Your Understanding
In this section, we offer two examples that illustrate how sampling distributions are used
to solve commom statistical problems. In each of these problems, the population standard deviation is
known; and the sample size is large. So you can use the Normal Distribution
Calculator, rather than the t-Distribution Calculator, to compute probabilities for these problems.
Normal Distribution Calculator
The normal calculator solves common statistical problems, based on the normal
distribution. The calculator computes cumulative probabilities, based on three
simple inputs. Simple instructions guide you to an accurate solution, quickly
and easily. If anything is unclear, frequently-asked questions and sample
problems provide straightforward explanations. The
calculator is free. It can found in the Stat Trek
main menu under the Stat Tools tab. Or you can tap the button below.
Normal Distribution Calculator
Would it be wrong to use the t-distribution when you know the population standard deviation and the sample
size is large? Not at all. When the sample size is large, the t-distribution and the normal distribution
yield approximately the same results.
Example 1
Assume that a school district has 10,000 6th graders. In this district, the
average weight of a 6th grader is 80 pounds, with a standard deviation of 20
pounds. Suppose you draw a random sample of 50 students. What is the
probability that the average weight of a sampled student will be less than 75
pounds?
Solution: To solve this problem, we need to define the sampling
distribution of the mean. Because our sample size is greater than
30, the Central Limit Theorem tells us that the sampling distribution will
approximate a normal distribution.
To define our normal distribution, we need to know both the mean of the sampling
distribution and the standard deviation. Finding the mean of the sampling
distribution is easy, since it is equal to the mean of the population. Thus,
the mean of the sampling distribution is equal to 80.
The standard deviation of the sampling distribution can be computed using the
following formula.
σ_{x} = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
σ_{x} = [ 20 / sqrt(50) ] * sqrt[ (10,000 - 50 ) / (10,000 - 1) ]
σ_{x} = (20/7.071) * (0.995) = 2.81
Let's review what we know and what we want to know. We know that the sampling
distribution of the mean is normally distributed with a mean of 80 and a
standard deviation of 2.81. We want to know the probability that a sample mean
is less than or equal to 75 pounds.
Because we know the population standard
deviation and the sample size is large, we'll use the normal distribution to find
probability. To solve the problem, we plug these inputs
into the Normal Probability Calculator: mean = 80, standard deviation = 2.81,
and normal random variable = 75. The Calculator tells us that the probability that the average
weight of a sampled student is less than 75 pounds is equal to 0.038.
Note: Since the population size is more than 20 times greater than the sample size,
we could have used the "approximate" formula σ_{x} = [ σ / sqrt(n) ]
to compute the standard error. Had we done that, we would have found a standard error equal to
[ 20 / sqrt(50) ] or 2.83.
Example 2
Find the probability that of the next 120 births, no more than 40% will be
boys. Assume equal probabilities for the births of boys and girls. Assume
also that the number of births in the population (N) is very large, essentially
infinite.
Solution: The Central Limit Theorem tells us that the proportion of boys
in 120 births will be approximately normally distributed.
The mean of the sampling distribution will be equal to the mean of the
population distribution. In the population, half of the births result in boys;
and half, in girls. Therefore, the probability of boy births in the population
is 0.50. Thus, the mean proportion in the sampling distribution should also be
0.50.
The standard deviation of the sampling distribution (i.e., the standard error) can be computed using the
following formula.
σ_{p} = sqrt[ PQ/n ] * sqrt[ (N - n ) / (N - 1) ]
Here, the finite population correction is equal to 1.0, since the population
size (N) was assumed to be infinite. Therefore, standard error formula reduces to:
σ_{p} = sqrt[ PQ/n ]
σ_{p} = sqrt[ (0.5)(0.5)/120 ] = sqrt[0.25/120 ] = 0.04564
Let's review what we know and what we want to know. We know that the sampling
distribution of the proportion is normally distributed with a mean of 0.50 and
a standard deviation of 0.04564. We want to know the probability that no more
than 40% of the sampled births are boys.
Because we know the population standard
deviation and the sample size is large, we'll use the normal distribution to find
probability. To solve the problem, we plug these
inputs into the Normal Probability Calculator: mean = .5, standard deviation =
0.04564, and the normal random variable = .4. The Calculator tells us that the probability that no
more than 40% of the sampled births are boys is equal to 0.014.
Note: This problem can also be treated as a
binomial experiment. Elsewhere, we showed
how to analyze a binomial experiment. The binomial experiment
is actually the more exact analysis. It produces a probability
of 0.018 (versus a probability of 0.14 that we found using the normal distribution). Without a computer,
the binomial approach is computationally demanding. Therefore,
many statistics texts emphasize the approach presented above,
which uses the normal distribution to approximate the binomial.