Sampling Distribution of the Mean
Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a mean score for each sample. The probability distribution of this statistic is the sampling distribution of the mean.
Shape of Sampling Distribution
When the sampling method is simple random sampling, the sampling distribution of the mean will often be shaped like a t-distribution or a normal distribution, centered over the mean of the population. The mean of the sampling distribution equals the mean of the population distribution.
μs = μp
where μs is the mean of the sampling distribution and μp is the mean of population.
When to Use the t-Distribution
It is safe to assume that the shape of the sampling distribution for a mean will be close to a t-distribution when any of the following conditions are true:
- Population values are approximately normally distributed.
- Sample size is smaller than 15; and the plot of sample data is symmetric, unimodal, without outliers.
- Sample size is between 15 and 40; and the plot of sample data is unimodal, without outliers, and only moderately skewed.
- Sample size is greater than 40, without outliers.
When to Use the Normal Distribution
The central limit theorem predicts that the sampling distribution will be approximately normally distributed when the sample size is sufficiently large.
If the population distribution is already approximately normal, a sample size of 30 will produce a sampling distribution that is approximately normal. If the population distribution is highly skewed, a sample size of 50 or more may be needed to produce a sampling distribution that is approximately normal.
Normal Distribution or t-Distribution?
When the sample size is large, the t-distribution is almost identical to the normal distribution. In that case, you could use either distribution for analysis. Here are guidelines for choosing between the two.
- If the population standard deviation is unknown and sample size is large, use the t-distribution with degrees of freedom equal to sample size minus one.
- If the population standard deviation is known and sample size is large, use the normal distribution.
Standard Deviation of the Sampling Distribution
Suppose we draw all possible simple random samples of size n from a population of size N. Suppose further that we compute a mean score x for each sample. In this way, we create a sampling distribution of the mean.
We know the following about the sampling distribution of the mean. The mean of the sampling distribution (μx) is equal to the mean of the population (μ). And the standard deviation of the sampling distribution (σx) is determined by the standard deviation of the population (σ), the population size (N), and the sample size (n), as shown in the equation below:
σx = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
In the standard deviation formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite population correction or fpc. When the population size is very large relative to the sample size, the fpc is approximately equal to one; and the standard deviation formula can be approximated by:
σx = σ / sqrt(n).
You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.
Standard Error of the Sampling Distribution
Often, we don't know the value for population standard deviation σ. And, if we don't know the population standard deviation, we cannot compute the standard deviation of the sampling distribution of the mean (σx).
However, we can use the sample standard deviation s to estimate the unknown population standard deviation. Substituting s into the equation for σx, we get:
s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
SEm = [ s / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
where s is the sample standard deviation, x is the sample mean, xi is the ith element from the sample, n is the number of elements in the sample, and SEm is a sample estimate of σx, the standard deviation of the sampling distribution. SEm is the standard error of the sampling distribution of the mean.
And when the population size is very large relative to the sample size, the standard error formula can be approximated by:
SEm = s / sqrt(n)
In future lessons, you will see that being able to compute the standard error from sample data is essential for inferential statistics. It will allow us to compute confidence intervals for mean scores and to test hypotheses about mean scores.
How to Find Probability
The sampling distribution of a sample mean is a probability distribution. You can use the sampling distribution to find a cumulative probability for any sample mean. Specifically, you can find:
P(x ≤ d)
where x is a sample mean and d is a constant called the critical value.
Finding the probability that the a sample mean will be no greater than the critical value d is a four-step process:
Step 1: Find Mean of Distribution
The mean of the sampling distribution of sample mean equals the mean of the population from which the sample was drawn. Thus,
μs = μp
where μs is the mean of the sampling distribution and μp is the mean of population.
Step 2: Find Standard Deviation
Earlier in this lesson (see above), we explained how to compute standard deviation of the sampling distribution when you know population variance. And we showed how to estimate the standard deviation with the standard error when you don't know the population variance. When population size is big relative to sample size, you can use these formulas for standard deviation and standard error:
σx = σ / sqrt(n)
SE = s / sqrt(n)
where σx is the standard deviation of the sampling distribution, SE is the standard error, σ is the popuation standard deviation, s is the sample estimate of the population standard deviation, and n is sample size.
Step 3: Transform d Into z- or t-Score
If you know the standard deviation of the sampling distribution and sample size is large (30 or more), compute a z-score using this formula:
z = (d – μs) / σx
where d is the critical value for which we want to find a probability, μs is the mean of the sampling distribution, and σx is the standard deviation of the sampling distribution.
If you don't know the standard deviation of the sampling distribution and sample size is small (less than 30), compute a t-score using this formula:
t = (d – μs) / SE
where SE is the standard error of the sampling distribution.
If you compute a t-score, you will also need to find the degrees of freedom. For the sampling distribution of a mean, degrees of freedom equals sample size minus one.
df = n - 1
where df is degrees of freedom.
Step 4: Find Probability
Find the probability for the z-score or a t-score that you calculated in Step 3; and you have found the probability that a sample mean will be no greater than the critical value, d.
You can find the probability for the z-score or a t-score from a handheld graphing calculator, from a written probability table commonly found in the appendix of introductory statistics texts, or from an online probability calculator, like Stat Trek's normal distribution calculator and t distribution calculator.
Test Your Understanding
Here are two problems to illustrate how to use the sampling distribution of the sample mean to solve common statistical problems. In the first problem, we compute a z-score and use a normal distribution calculator to arrive at a solution. In the second problem, we compute a t-score and use a t distribution calculator.
Problem 1
Assume that a school district has 10,000 6th graders. In this district, the
average weight of a 6th grader is 80 pounds, with a population standard deviation of 20
pounds. Suppose you draw a random sample of 50 students. What is the
probability that the sample mean will be less than 75 pounds?
Solution: Here is the four-step solution to solve this problem.
-
Step 1. Find the mean of the sampling distribution. The mean of the sampling distribution (μs)
will equal the mean of the population (μp). Thus,
the mean of the sampling distribution is equal to 80.
μs = μp
μs = 80
-
Step 2. Find the standard deviation of the sampling distribution.
The standard deviation of the sampling distribution can be computed using the
following formula.
σx = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
σx = [ 20 / sqrt(50) ] * sqrt[ (10,000 - 50 ) / (10,000 - 1) ]
σx = (20/7.071) * (0.995) = 2.81Note: Because population size (10,000) is large relative to sample size (50), we could have used this simpler formula to compute standard deviation:
σx = σ / sqrt(n)
We'll demonstrate the simpler formula in the next problem.
-
Step 3. Transform d into a z- or t-score. In this problem, d is 75, the critical value for which we want to find a cumulative probability.
Because sample size is large and we know the population standard deviation, we compute a z-score.
z = (d - μs)/σx = (75 - 80)/2.81 = -1.78
- Step 4. Find the probability. To find this probability, we use Stat Trek's Normal Distribution Calculator. Specifically, we enter the following inputs: -1.78, for the z-score; 0, for the mean; and 1, for the standard deviation. (It is not necessary to compute the mean or standard deviation of the z-score, because every z-score has a mean of 0 and a standard deviation of 1.)
The Calculator tells us that the probability that the average weight of a sampled student will be less than 75 pounds is 0.03754. Not very likely.
Problem 2
Let's revisit Problem 1, with a twist. Here is the problem now. Assume that a school district has 10,000 6th graders. In this district, the
average weight of a 6th grader is 80 pounds. Suppose you draw a random sample of 50 students and find the sample standard deviation to be 20
pounds. If you drew another random sample of 50 students, what is the
probability that the sample mean in the second sample would be less than 75 pounds?
Solution: Here is the four-step solution to solve this problem.
-
Step 1. Find the mean of the sampling distribution. The mean of the sampling distribution (μs)
will equal the mean of the population (μp). Thus,
the mean of the sampling distribution is equal to 80.
μs = μp
μs = 80
-
Step 2. Find the standard deviation of the sampling distribution. Since we don't know the standard deviation of the population (σ),
we cannot compute the standard deviation of the sampling distribution. But we do know the standard deviation of the sample (s);
so we can can compute the standard error (SE), and we use the standard error to estimate the standard deviation of the sampling distribution.
Since population size (10,000) is large relative to sample size (50), we can use this simple formula to compute standard error:
SE = s / sqrt(n)
SE = [ 20 / sqrt(50) ] = 2.83
-
Step 3. Transform d into a z- or t-score. In this problem, d is 75, the critical value for which we want to find a cumulative probability.
Because we are using the sample standard deviation to estimate the population standard deviation, we compute a t-score.
t = (d - μs)/SE = (75 - 80)/2.83 = -1.77
And we find that the degrees of freedom for this t-score to be:
df = n - 1 = 50 - 1 = 49
- Step 4. Find the probability. To find this probability, we use Stat Trek's t Distribution Calculator. Specifically, we enter the following inputs: -1.77 for the t-score and 49 for the degrees of freedom.
The Calculator tells us that the probability that the average weight of a sampled student is less than 75 pounds is 0.041.
Note: As sample size increases, the t distribution more closely resembles the normal distribution. Since the sample size (n=50) in Problem 1 and Problem 2 is relatively large, it is not surprising that we get a similar result, whether we use a normal distribution calculator or a t distribution calculator. In both cases, we find the probability that the average weight of a sampled student will be less than 75 pounds is approximately 0.04.