What is Hypothesis Testing?

A statistical hypothesis is an assumption about a population parameter. This assumption may or may not be true. Hypothesis testing refers to the formal procedures used by statisticians to determine whether there is enough evidence in a sample of data to support a specific claim (i.e., an hypothesis) about a population parameter.

Most surveys are designed to estimate a population parameter, but some surveys also test a hypothesis about that parameter. This lesson, as well as the next few lessons, cover topics that researchers should understand when they use survey data for hypothesis testing.

Statistical Hypotheses

The best way to determine whether a statistical hypothesis is true would be to examine the entire population. Since that is often impractical, researchers typically examine a random sample from the population. If sample data are not consistent with the statistical hypothesis, the hypothesis is rejected.

There are two types of statistical hypotheses.

Null hypothesis. The null hypothesis, denoted by H₀, is usually the hypothesis that sample observations result purely from chance.
Alternative hypothesis. The alternative hypothesis, denoted by H₁ or H_a, is the hypothesis that sample observations are influenced by some non-random cause.

Note: The null and alternative hypotheses are statements about population parameters, not sample statistics. Sample statistics are used as evidence to test these hypotheses.

For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis might be that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be that the number of Heads and Tails would be very different. Symbolically, these hypotheses would be expressed as

H₀: P = 0.5
H_a: P ≠ 0.5

where P is the population proportion of coin flips that land on Heads.

Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be inclined to reject the null hypothesis. We would conclude, based on the evidence, that the coin was probably not fair and balanced.

Can We Accept the Null Hypothesis?

Some researchers say that a hypothesis test can have one of two outcomes: you accept the null hypothesis or you reject the null hypothesis. Many statisticians, however, take issue with the notion of "accepting the null hypothesis." Instead, they say: you reject the null hypothesis or you fail to reject the null hypothesis.

Why the distinction between "acceptance" and "failure to reject?" Acceptance implies that the null hypothesis is true. Failure to reject implies that the data are not sufficiently persuasive for us to prefer the alternative hypothesis over the null hypothesis.

Hypothesis Tests

Statisticians follow a formal process to determine whether to reject a null hypothesis, based on sample data. This process, called hypothesis testing, consists of five steps.

State the hypotheses. This involves stating the null and alternative hypotheses. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false.
Choose the significance level (aka, alpha or α). The significance level is the probability of rejecting the null hypothesis when it is true. (Researchers frequently choose α = 0.05 or α = 0.01).
Compute the test statistic. Choose an appropriate statistical test based on the type of data and the hypothesis being tested, and use sample data to compute a test statistic (e.g., t-score, z-score, chi-square).
Find the P-value. The P-value is the probability that a sample outcome will be as extreme as the test statistic, given the null hypothesis.
Interpret results. Compare the P-value to the significance level (α). If the P-value is less than alpha, reject the null hypothesis.

Decision Errors

Two types of errors can result from a hypothesis test.

Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The probability of committing a Type I error is called the significance level. This probability is also called alpha, and is often denoted by α.
Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis that is false. The probability of committing a Type II error is called Beta, and is often denoted by β. The probability of not committing a Type II error is called the Power of the test. Power equals 1 - β

Decision Rules

The analysis plan for a hypothesis test must include decision rules for rejecting the null hypothesis. In practice, statisticians describe these decision rules in two ways - with reference to a P-value or with reference to a region of acceptance.

P-value. The strength of evidence in support of a null hypothesis is measured by the P-value. Suppose the test statistic is equal to S. The P-value is the probability of observing a test statistic as extreme as S, assuming the null hypothesis is true. If the P-value is less than the significance level, we reject the null hypothesis.
Region of acceptance. The region of acceptance is a range of values. If the test statistic falls within the region of acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the chance of making a Type I error is equal to the significance level.
The set of values outside the region of acceptance is called the region of rejection. If the test statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we say that the hypothesis has been rejected at the α level of significance.

While both methods are valid, the P-value approach is taught as part of the AP Statistics curriculum; and the P-value approach is generally preferred in modern practice.

One-Tailed and Two-Tailed Tests

The choice between a one-tailed and two-tailed test depends on the directionality of the alternative hypothesis.

One-tailed test. A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling distribution, is called a one-tailed test. For example, suppose the null hypothesis states that the mean is less than or equal to 10. The alternative hypothesis might be that the mean is greater than 10. The region of rejection would consist of a range of numbers located on the right side of sampling distribution; that is, a set of numbers greater than 10. Or the alternative hypothesis might be that mean is less than 10, with the region of rejection on the left side of the sampling distribution. A one-tailed test would look like this:

H₀: μ = 10
H_a: μ > 10 or H₀: μ = 10
H_a: μ < 10
Two-tailed test. A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling distribution, is called a two-tailed test. For example, suppose the null hypothesis states that the mean is equal to 10. The alternative hypothesis would be that the mean is not equal to 10. The region of rejection would consist of a range of numbers located on both sides of sampling distribution; that is, the region of rejection would consist partly of numbers that were less than 10 and partly of numbers that were greater than 10. A two-tailed test would look like this:

H₀: μ = 10
H_a: μ ≠ 10

With a two-tailed test, the significance level (α) is split equally between the two tails of the distribution. With a one-tailed test, the significance level is entirely in one tail of the distribution.

Test Your Understanding

Problem 1

A quality control engineer reported results from significance tests on the mean number of defects in finished products. The P-value for one test was 0.02. What can the engineer conclude from this test.

(A) If the null hypothesis were true, we would expect results as extreme as those observed 2 percent of the time.
(B) If the alternative hypothesis were true, we would expect results as extreme as those observed 2 percent of the time
(C) 2 percent of finished products were defective
(D) The probability of a Type II error is 0.02.
(E) The probability of a Type II error is 0.98.

Solution

The correct answer is (A). The P-value is the probability of the observed test statistic, assuming the null hypothesis is true. If the null hypothesis were true, a P-value probability as low as 0.02 would be unexpected. It would suggest that the null hypothesis might not be true.

Problem 2

Which of the following is a correct set of hypotheses to test whether more than 15 percent of voters in the last election were older than 65.

(A) H₀: p = 0.15 and H_a: p > 0.15 where p is the sample proportion
(B) H₀: P = 0.15 and H_a: P > 0.15 where P is the population proportion
(C) H₀: p = 0.15 and H_a: p ≥ 0.15 where p is the sample proportion
(D) H₀: P = 0.15 and H_a: P ≥.15 where P is the population proportion
(E) None of the above

Solution

The correct answer is (B). The null hypothesis should be stated in the form of an equality about the population parameter, and the alternative hypothesis should be stated in the form of an inequality about a population parameter. Additionally, the two hypotheses must be mutually exclusive. Option B satisfies these requirements. Option A refers to a sample statistic, not a population parameter, so it is incorrect. With Options C and D, the null hypothesis and the alternative hypothesis are not mutually exclusive. The proportion 0.15 is true for both options, so they are incorrect.

Last lesson Next lesson