Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics

Show navigation menu Search site

Stat Trek

Teach yourself statistics

Show navigation menu Search site

Difference Between Proportions

Statistics problems often involve comparisons between two independent sample proportions. This lesson explains how to compute probabilities associated with differences between proportions.

Difference Between Proportions: Theory

Suppose we have two populations with proportions equal to P1 and P2. Suppose further that we take all possible samples of size n1 and n2. And finally, suppose that the following assumptions are valid.

  • The size of each population is large relative to the sample drawn from the population. That is, N1 is large relative to n1, and N2 is large relative to n2. (In this context, populations are considered to be large if they are at least 20 times bigger than their sample.)
  • The samples from each population are big enough to justify using a normal distribution to model differences between proportions. The sample sizes will be big enough when the following conditions are met: n1P1 > 10, n1(1 -P1) > 10, n2P2 > 10, and n2(1 - P2) > 10. (This criterion requires that at least 40 observations be sampled from each population. When P1 or P2 is more extreme than 0.5, even more observations are required.)
  • The samples are independent; that is, observations in population 1 are not affected by observations in population 2, and vice versa.

Given these assumptions, we know the following.

  • The set of differences between sample proportions will be normally distributed. We know this from the central limit theorem.
  • The expected value of the difference between all possible sample proportions is equal to the difference between population proportions. Thus, E(p1 - p2) = P1 - P2.
  • The standard deviation of the difference between sample proportions (σd) is approximately equal to:

    σd = sqrt{ [P1(1 - P1) / n1] + [P2(1 - P2) / n2] }

It is straightforward to derive the last bullet point, based on material covered in previous lessons. The derivation starts with a recognition that the variance of the difference between independent random variables is equal to the sum of the individual variances. Thus,

σ2d = σ2P1 - P2 = σ21 + σ22

If the populations N1 and N2 are both large relative to n1 and n2, respectively, then

σ21 = P1(1 - P1) / n1       And       σ22 = P2(1 - P2) / n2

Therefore,

σ2d = [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ]
And
σd = sqrt{ [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ] }

Difference Between Proportions: Sample Problem

In this section, we work through a sample problem to show how to apply the theory presented above. In this example, we will use Stat Trek's Normal Distribution Calculator to compute probabilities.

Normal Distribution Calculator

The normal calculator solves common statistical problems, based on the normal distribution. The calculator computes cumulative probabilities, based on three simple inputs. Simple instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Normal Distribution Calculator

Sample Problem

In one state, 52% of the voters are Republicans, and 48% are Democrats. In a second state, 47% of the voters are Republicans, and 53% are Democrats. Suppose 100 voters are surveyed from each state. Assume the survey uses simple random sampling.

What is the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state?

(A) 0.04
(B) 0.05
(C) 0.24
(D) 0.71
(E) 0.76

Solution

The correct answer is C. For this analysis, let P1 = the proportion of Republican voters in the first state, P2 = the proportion of Republican voters in the second state, p1 = the proportion of Republican voters in the sample from the first state, and p2 = the proportion of Republican voters in the sample from the second state. The number of voters sampled from the first state (n1) = 100, and the number of voters sampled from the second state (n2) = 100.

The solution involves four steps.

  • Make sure the samples from each population are big enough to model differences with a normal distribution. Because n1P1 = 100 * 0.52 = 52, n1(1 - P1) = 100 * 0.48 = 48, n2P2 = 100 * 0.47 = 47, and n2(1 - P2) = 100 * 0.53 = 53 are each greater than 10, the sample size is large enough.
  • Find the mean of the difference in sample proportions: E(p1 - p2) = P1 - P2 = 0.52 - 0.47 = 0.05.
  • Find the standard deviation of the difference.

    σd = sqrt{ [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ] }

    σd = sqrt{[(0.52)(0.48) / 100] + [(0.47)(0.53) / 100]}

    σd = sqrt (0.002496 + 0.002491)

    σd = sqrt(0.004987) = 0.0706

  • Find the probability. This problem requires us to find the probability that p1 is less than p2. This is equivalent to finding the probability that p1 - p2 is less than zero. To find this probability, we need to transform the random variable (p1 - p2) into a z-score. That transformation appears below.

    zp1 - p2 = (x - μp1 - p2) / σd

    zp1 - p2 = (0 - 0.05)/0.0706 = -0.7082

We can use Stat Trek's Normal Distribution Calculator to find the probability of a z-score being -0.7082 or less. We know that the z-score is -0.7082, the mean of a z-score is 0, and the standard deviation of a z-score is 1. We plug those numbers into the calculator, as shown below.

Normal Distribution Calculator

The calculator tells us that the probability of finding a z-score less than -0.7082 is 0.23941. Therefore, the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state is about 0.24.

Note: Some analysts might have used the t-distribution to compute probabilities for this problem. We chose the normal distribution because the population variance was known and the sample size was large. But it would not have been wrong to use the t-distribution. In a previous lesson, we offered some guidelines for choosing between the normal and the t-distribution.