How to Analyze Survey Data

When you analyze survey sample data, the goal of the analysis is to describe a population parameter. Because you are dealing with incomplete data (a sample versus a census), your description will be tinged with uncertainty.

Therefore, the results you report should include a point estimate of the population parameter, plus a measure of the uncertainty inherent in your estimate. The formulas you use will vary, depending on the sampling method and the population parameter. However, the logic of the analysis does not change.

This lesson covers the logic of the analysis. We will cover analytical formulas in future lessons.

The Logic of the Analysis

In a big-picture sense, the analysis of survey sampling data is easy. Whenever you conduct a survey, the analysis includes the same seven steps:

Estimate a population parameter.
Estimate population variance.
Compute standard error.
Specify a confidence level.
Find the critical value (often a z-score or a t-score).
Compute margin of error.
Define confidence interval.

It doesn't matter whether the sampling method is simple random sampling, stratified sampling, or cluster sampling. And it doesn't matter whether the parameter of interest is a mean score, a proportion, or a total score. The analysis of survey sampling data always includes the same seven steps.

Let's look a little bit closer at each step - what we do in each step and why we do it. When you understand what is really going on, it will be easier for you to apply formulas correctly and to interpret analytical findings.

Estimating a Population Parameter

The first step in the analysis of survey data is to estimate the value of a population parameter. For example, we might use sample data to develop a point estimate for a population mean, a population total, or a population proportion.

Because different samples can produce different point estimates, you can be fairly sure that the estimate from your sample does not equal the true value of the population parameter exactly.

Therefore, you need a way to express the uncertainty inherent in your estimate. The remaining six steps in the analysis are geared toward quantifying the uncertainty in your estimate.

Estimate Population Variance

The variance is a numerical value used to measure the variability of observations in a group. If individual observations vary greatly from the group mean, the variance is big; and vice versa.

Why do we care about variance? The variance is needed to compute the standard error. And why do we care about the standard error? Read on.

Computing Standard Error

The standard error is possibly the most important output from our analysis. It allows us to compute the margin of error and the confidence interval.

Think of the standard error as the standard deviation of a sample statistic. In survey sampling, there are usually many different subsets of the population that we might choose for analysis. Each different sample might produce a different estimate of the value of a population parameter. The standard error provides a quantitative measure of the variability of those estimates.

Specifying Confidence Level

In survey sampling, different samples can be randomly selected from the same population; and each sample can often produce a different confidence interval. Some confidence intervals include the true population parameter; others do not.

A confidence level refers to the percentage of all possible samples that produce confidence intervals that include the true population parameter. For example, suppose all possible samples were selected from the same population, and a confidence interval were computed for each sample. A 95% confidence level implies that 95% of the confidence intervals would include the true population parameter.

As part of the analysis, survey researchers choose a confidence level. Probably, the most frequently chosen confidence level is 95%.

Finding Critical Value

Often expressed as a t-score or a z-score, the critical value is a factor used to compute the margin of error. To find the critical value, follow these steps:

Compute alpha (α): α = 1 - (confidence level / 100)
Find the critical probability (p*): p* = 1 - α/2
To express the critical value as a z-score, find the z-score having a cumulative probability equal to the critical probability (p*).
To express the critical value as a t statistic, follow these steps:
- Find the degrees of freedom (df). Often, df is equal to the sample size minus one.
- The critical t statistic (t*) is the t statistic having degrees of freedom equal to df and a cumulative probability equal to the critical probability (p*).

Researchers use a t-score when sample size is small; a z-score when it is large (at least 30). You can use the Normal Distribution Calculator to find the critical z-score, and the t Distribution Calculator to find the critical t statistic.

Computing Margin of Error

The margin of error expresses the maximum expected difference between the true population parameter and a sample estimate of that parameter. To be meaningful, the margin of error should be qualified by a probability statement (often expressed in the form of a confidence level).

For example, a pollster might report that 50% of voters will choose the Democratic candidate. To indicate the quality of the survey result, the pollster might add that the margin of error is 5%, with a confidence level of 90%. This means that if the survey were repeated many times with different samples, the true percentage of Democratic voters would fall within the margin of error 90% of the time.

Here is the formula for computing margin of error:

ME = SE * CV

where ME is margin of error, SE is standard error, and CV is the critical value.

Defining Confidence Interval

Statisticians use a confidence interval to express the degree of uncertainty associated with a sample statistic. A confidence interval is an interval estimate combined with a probability statement.

Here is how to compute the minimum and maximum values for a confidence interval.

Estimate the population parameter (P).
Compute the standard error (SE).
Find the critical value (CV), given a particular confidence level.
Compute minimum and maximum values for the confidence interval:

CI_min = P - SE * CV

CI_max = P + SE * CV

Thus, the confidence interval is an interval estimate that ranges between CI_min and CI_max.

And here is how to interpret a confidence interval. The confidence level describes uncertainty associated with the interval estimate. For example, a 95% confidence level results in a 95% confidence interval. A 95% confidence interval implies that if we used the same sampling method to select different samples and computed a 95% confidence interval for each sample, we would expect the true population parameter to fall within the interval estimate 95% of the time.

What About the Formulas?

You probably noticed that this discussion omitted three important formulas - an equation to estimate the population parameter, an equation to compute the standard deviation, and an equation to compute the standard error of the estimate.

These equations will vary depending on the parameter of interest (e.g., a mean score, a proportion, or a total score) and the sampling method (e.g., simple random sampling, stratified sampling, cluster sampling). To find the right equation for each situation, click the appropriate link in the table below.

Sampling method	Parameter	Analysis plan
Simple random sampling	Mean Proportion Total	Link Link Link
Stratified sampling	Mean Proportion Total	Link Link Link
Cluster sampling	Mean Proportion Total	Link Link Link

For every situation, the logic of the analysis is identical. The only thing that changes is the equations that are used. So, once you understand the logic (i.e., the seven steps described above), you can easily analyze survey data.

Test Your Understanding

Problem

Which of the following statements are true?

I. We use the standard error to find the critical value.
II. We use the standard error to find the margin of error.
III. We use the standard error to define the confidence interval.

(A) I and II are true.
(B) I and III are true.
(C) II and III are true.
(D) None of the statements are true.
(E) All of the statements are true.

Solution

The correct answer is (C). The standard error is a term in the formula to compute the margin of error. And it is a term in the formulas to define the minimum and maximum values of the confidence interval. But the standard error is not needed to find the critical value.

Last lesson Next lesson