Analysis of Simple Random Samples

The key to analyzing data from simple random samples is computing the variability of the sample estimate. When you know the variability of the sample estimate, you can construct confidence intervals; and you can test hypotheses.

Notation

The following notation is helpful, when we talk about analyzing data from simple random samples.

  • σ: The known standard deviation of the population.
  • σ2: The known variance of the population.
  • P: The true population proportion.
  • N: The number of observations in the population.
  • x: The sample estimate of the population mean.
  • s: The sample estimate of the standard deviation of the population.
  • s2: The sample estimate of the population variance.
  • p: The proportion of successes in the sample.
  • n: The number of observations in the sample.
  • SD: The standard deviation of the sampling distribution.
  • SE: The standard error. (This is an estimate of the standard deviation of the sampling distribution.)
  • Σ = Summation symbol, used to compute sums over the sample. ( To illustrate its use, Σ xi = x1 + x2 + x3 + ... + xm-1 + xm )

The Variability of the Estimate

The precision of a sample design is directly related to the variability of the estimate. Two common measures of variability are the standard deviation (SD) of the estimate and the standard error (SE) of the estimate. The tables below show how to compute both measures, assuming that the sample method is simple random sampling.

The first table shows how to compute variability for a mean score. Note that the table shows four sample designs. In two of the designs, the true population variance is known; and in two, it is estimated from sample data. Also, in two of the designs, the researcher sampled with replacement; and in two, without replacement.

Population variance Replacement strategy Variability
Known With replacement SD = sqrt [ σ2 / n ]
Known Without replacement SD = sqrt { [ ( N - n ) / ( N - 1 ) ] * σ2 / n }
Estimated With replacement SE = sqrt [ s2 / n ]
Estimated Without replacement SE = sqrt { [ ( N - n ) / ( N - 1 ) ] * s2 / n }

The next table shows how to compute variability for a proportion. Like the previous table, this table shows four sample designs. In two of the designs, the true population proportion is known; and in two, it is estimated from sample data. Also, in two of the designs, the researcher sampled with replacement; and in two, without replacement.

Population proportion Replacement strategy Variability
Known With replacement SD = sqrt [ P * ( 1 - P ) / n ]
Known Without replacement SD = sqrt { [ ( N - n ) / ( N - 1 ) ] * P * ( 1 - P ) / n }
Estimated With replacement SE = sqrt [ p * ( 1 - p ) / ( n - 1 ) ]
Estimated Without replacement SE = sqrt [ [ ( N - n ) / ( N - 1 ) ] * p * ( 1 - p ) / n ]

Sample Problem

This section presents a sample problem that illustrates how to analyze survey data when the sampling method is simple random sampling. (In a subsequent lesson, we re-visit this problem and see how simple random sampling compares to other sampling methods.)

Sample Planning Wizard

The analysis of data collected via simple random sampling can be complex and time-consuming. Stat Trek's Sample Planning Wizard can help. The Wizard computes survey precision, sample size requirements, costs, etc., as well as estimates population parameters and tests hypotheses. It also creates a summary report that lists key findings and documents analytical techniques. The Wizard is free. You can find the Sample Planning Wizard in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.

Sample Planning Wizard

Problem 1

At the end of every school year, the state administers a reading test to a simple random sample drawn without replacement from a population of 20,000 third graders. This year, the test was administered to 36 students selected via simple random sampling. The test score from each sampled student is shown below:

50, 55, 60, 62, 62, 65, 67, 67, 70, 70, 70, 70, 72, 72, 73, 73, 75, 75,
75, 78, 78, 78, 78, 80, 80, 80, 82, 82, 85, 85, 85, 88, 88, 90, 90, 90 

Using sample data, estimate the mean reading achievement level in the population. Find the margin of error and the confidence interval. Assume a 95% confidence level.

Solution: Elsewhere on this website, we described how to compute the confidence interval for a mean score. We follow that process below.

  • Identify a sample statistic. Since we are trying to estimate a population mean, we choose the sample mean as the sample statistic. The sample mean is:

    x = Σ ( xi ) / n

    x = ( 50 + 55 + 60 + ... + 90 + 90 + 90 ) / 36 = 75

    Therefore, based on data from the simple random sample, we estimate that the mean reading achievement level in the population is equal to 75.
  • Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 95% confidence level.
  • Find the margin of error. Elsewhere on this site, we show how to compute the margin of error when the sampling distribution is approximately normal. The key steps are shown below.
    • Find standard error of the sampling distribution. First, we estimate the variance of the test scores (s2). And then, we compute the standard error (SE).

      s2 = Σ ( xi - x )2 / ( n - 1 )
      s2 = [ (50 - 75)2 + (55 - 75)2 + (60 - 75)2 + ... + (90 - 75)2 + (90 - 75)2 ] / 29 = 98.97

      SE = sqrt { [ ( N - n ) / ( N - 1 ) ] * s2 / n }
      SE = sqrt [ ( 0.998 ) * 98.97 / 36 ] = 1.66

    • Find critical value. The critical value is a factor used to compute the margin of error. Based on the central limit theorem, we can assume that the sampling distribution of the mean is normally distributed. Therefore, we express the critical value as a z-score. To find the critical value, we take these steps.
      • Compute alpha (α):

        α = 1 - (confidence level / 100)

        α = 1 - 95/100 = 0.05

      • Find the critical probability (p*):

        p* = 1 - α/2 = 1 - 0.05/2 = 0.975

      • The critical value is the z-score having a cumulative probability equal to 0.975. From the Normal Distribution Calculator, we find that the critical value is 1.96.
    • Compute margin of error (ME):

      ME = critical value * standard error

      ME = 1.96 * 1.66 = 3.25

  • Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.

Therefore, the 95% confidence interval is 71.75 to 78.25. And the margin of error is equal to 3.25. That is, we are 95% confident that the true population mean is in the range defined by 75 + 3.25.