Scheffé's Test for Multiple Comparisons
The lesson is all about Scheffé's test - what it is, why it is needed, when to use it, and how to implement it.
Prerequisites: This lesson assumes familiarity with comparisons. You should know how to represent a statistical hypothesis mathematically by a comparison. You should be able to compute the sum of squares associated with a comparison. And you should understand how the probability of committing a Type I error is affected by the number of comparisons tested. If you don't know these things, review the following lessons:
- Comparison of Treatment Means. This lesson defines an ordinary comparison. It explains how to represent a statistical hypothesis mathematically by a comparison. And it explains how to compute the sum of squares for a comparison.
- Multiple Comparisons. This lesson describes how the probability of committing a Type I error is affected by the number of comparisons tested.
What is Scheffé's test?
Scheffé's test is a method for testing all pairwise and all non-pairwise comparisons of treatment means. Here's how it works:
- Step 1. Set a significance level (α) for the error rate familywise. (The significance level for Scheffé's test should equal the signifcance level used for the omnibus ANOVA in Step 3.)
- Step 2. Find the value for each comparison (Li) that you want to test.
- Step 3. Generate an ANOVA table from a standard, omnibus analysis of variance.
- Step 4. Use the following formula to compute a critical value for Scheffé's test of comparison Li:
___________________________________ CVi = √ (k - 1) F(v1, v2) MSE [ Σ(cj2 / nj ) ] Note: To find values for the degrees of freedom and the mean squared error, refer to the ANOVA table from Step 3. To find F(v1, v2), use Stat Trek's F Distribution Calculator with the significance level from Step 1.
- Step 5. Compare the value from Step 2 (Li) with the value from Step 4 (CVi). If Li is bigger than CVi, the comparison is statistically significant.
Why Do We Need Scheffé's test?
The Scheffé test is used mainly with post hoc comparisons in analysis of variance (ANOVA) experiments. The test is used to determine whether the mean score in one treatment group differs from the mean score in a second treatment group, or whether the mean score for one set of treatment groups differs from the mean score for a second set of treatment groups.
When to Use Scheffé's test
In some situations, Scheffé's test is a good technique for testing the statistical significance of multiple comparisons. In other situations, it is not so good.
Advantages
There are several things to like about the Scheffé test, including the following:
- The Scheffé test can be used to make all possible comparisons among treatment means - pairwise comparisons (comparisons involving only two means) and non-pairwise comparisons (comparisons involving more than two means).
- The Scheffé test sets the error rate familywise equal to a significance level (α) specified by the experimenter.
- The Scheffé test can be used with unequal sample sizes between groups.
- The Scheffé test provides a more sensitive test of non-pairwise comparisons than some other post hoc testing procedures (e.g., Tukey's HSD test).
- When an experiment calls for many planned comparisons, the risk of Type I errors can be unacceptably high. In this situation, the Scheffé test, which controls error rate familywise, may be a good alternative to tests that are normally used for planned comparisons.
For an experimenter who wants to test a lot of comparisons post hoc (particularly non-pairwise comparisons) and still control error rate familywise, the Scheffé test is a good choice.
Disadvantages
There are several things to dislike about the Scheffé test, including the following:
- The Scheffé test has lower statistical power than tests that are designed for planned comparisons.
- For testing pairwise comparisons, the Scheffé test is less sensitive some other post hoc procedures (e.g., Tukey's HSD test).
Note: A good way to increase the power of the Scheffé test is to use large sample sizes.
What Do Statisticians Say?
If you ask a statistician about when to use Scheffé's test, here are some comments you might hear:
- For post hoc testing, it only makes sense to use Scheffé's test after a significant omnibus analysis of variance. If the analysis of variance does not provide evidence of significant differences among means, there is no need to conduct follow-up tests looking for those differences.
- For post hoc testing of many comparisons, it makes sense to use Scheffé's test. For post hoc testing of only a few comparisons, Bonferroni's correction might be the better choice.
- For a priori testing, Scheffé's test can be an acceptable choice when the experiment calls for tests of many comparisons. When there are many comparisons to be tested, Scheffé's test might be considered a "safe" technique; because compared to other methods, it provides a reasonable balance between control of Type I errors and risk of Type II errors.
A Step-By-Step Example
In this section, we'll work through a simple example to illustrate the planning and analysis required for post hoc testing with Scheffé's test.
Experimental Design
To test the long-term effect of aerobic exercise on resting pulse rate, an investigator conducts a controlled experiment. The experiment uses a completely randomized design, consisting of three treatment groups:
- Control. Subjects do not participate in an exercise program.
- Low-effort. Subjects jog 1 mile on Monday, Wednesday, and Friday.
- High-effort. Subjects jog 2 miles every day, except Sunday.
Five subjects are randomly assigned to each group; and, after 28 days of treament, their resting pulse rate is measured on day 29.
A Priori Analysis
To test planned comparisons, the investigator poses the research questions to be answered, states statistical hypotheses implied by each research question, and identifies the analytical technique(s) used to test each statistical hypothesis - all before any data is collected. Then, following data collection, data is analyzed according to plan.
Research Question
For this experiment, the researcher is initially interested in one research question. That question, and the associated statistical hypotheses, appears below:
- Overall research question. Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group?
H0: μi = μj
H1: μi ≠ μj
Analytical Techniques
The overall research question asks whether the mean pulse rate in one treatment group differs from the mean pulse rate in any other group. The null hypothesis implied by this research question can be tested by an omnibus analysis of variance.
For this example, assume that the investigator specifies a significance level of 0.05 to test the statistical significance of the main research question.
Experimental Data
Pulse rate measurements for each subject in each treatment group appear below:
Table 1. Pulse Rate for Each Subject in Each Group
Group 1 (control) | Group 2 (low effort) | Group 3 (high effort) |
---|---|---|
80 | 70 | 50 |
85 | 75 | 60 |
90 | 80 | 70 |
95 | 85 | 80 |
100 | 90 | 90 |
ANOVA Results
The overall research question for a priori analysis is: Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group? The statistical hypotheses implied by that question are:
H0: μi = μj
H1: μi ≠ μj
We can test this null hypothesis with a standard, omnibus analysis of variance. Here is the ANOVA table from that analysis.
Table 2. ANOVA Summary Table
Source | SS | df | MS | F | P |
---|---|---|---|---|---|
BG | 1000 | 2 | 500 | 4.0 | 0.046 |
Error | 1500 | 12 | 125 | ||
Total | 2500 | 14 |
The P value for the between-groups (BG) effect is 0.046, which is less that the significance level of 0.05. Therefore, we reject the null hypothesis of no difference in pulse rates between treatment groups.
Note: We explained how to conduct a one-way analysis of variance in previous lessons. If you're wondering how to produce the ANOVA table shown above, see One-Way Analysis of Variance: Example or One-Way Analysis of Variance With Excel.
Post Hoc Analysis
Having ascertained through the a priori analysis that a significant difference exists among the mean scores, suppose the experimenter wants to investigate how the means differ.
Post Hoc Research Questions
For this post hoc analysis, the researcher decides to ask four follow-up questions. For each question, there is an implied statistical hypothesis which can be tested by a unique comparison. The questions, hypotheses, and comparisons appear below:
- Follow-up question 1. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects
in the low-effort group (Group 2)?
H0: μ1 = μ2
H1: μ1 ≠ μ2
This statistical hypothesis can be represented mathematically by the comparison L1:L1 = X1 - X2
- Follow-up question 2. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects
in the high-effort group (Group 3)?
H0: μ1 = μ3
H1: μ1 ≠ μ3
This statistical hypothesis can be represented mathematically by the comparison L2:L2 = X1 - X3
- Follow-up question 3. Will mean pulse rate of subjects in the low-effort group (Group 2) differ from the mean pulse rate of subjects
in the high-effort group (Group 3)?
H0: μ2 = μ3
H1: μ2 ≠ μ3
This statistical hypothesis can be represented mathematically by the comparison L3:L3 = X2 - X3
- Follow-up question 4. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects
in treatment groups (Group 2 and Group 3)?
H0: μ1 = (μ2 + μ3) / 2
H1: μ1 ≠ (μ2 + μ3) / 2
This statistical hypothesis can be represented mathematically by the comparison L4:L4 = X1 - 0.5X2 - 0.5X3
In the equations above, X1, X2, and X3 are mean scores for Groups 1, 2, and 3, respectively.
Post Hoc Analysis With Scheffé's Test
Each null hypothesis associated with a follow-up question can be represented mathematically by a unique comparison. To determine whether to reject the null hypothesis for a follow-up question, we can test its associated comparison for statistical significance, using Scheffé's test. To illustrate the process, we'll work though Scheffé's test step-by-step.
Step 1. Specify a Significance Level
For post hoc analyses with Scheffé's test, the significance level should equal the significance level used a priori for the omnibus, analysis of variance. We used a significance level of 0.05 for the a priori analysis, so we will use a significance level of 0.05 for Scheffé's test.
Step 2. Find Comparison Values
Each comparison is a function of mean scores from treatment groups. Mean pulse rate within each group (computed from raw scores in Table 1) appears below:
Table 3. Mean Pulse Rate in Each Treatment Group
Group 1 (control) | Group 2 (low effort) | Group 3 (high effort) |
---|---|---|
90 | 80 | 70 |
Given the treatment means, it is a simple matter to compute values for each comparison, as shown below:
Table 4. Comparison Values
Comparison | Value |
---|---|
L1 = X1 - X2 | 10 |
L2 = X1 - X3 | 20 |
L3 = X2 - X3 | 10 |
L4 = X1 - 0.5X2 - 0.5X3 | 15 |
Step 3. Generate ANOVA Table
The summary table from an omnibus analysis of variance includes two outputs that we can use to test the statistical significance of a comparison. Those outputs are (1) the value of the mean squared error and (2) the degrees of freedom for the mean squared error.
We generated the ANOVA summary table earlier, as part of the a priori analysis. For convenience, here it is again.
Table 2. ANOVA Summary Table
Source | SS | df | MS | F | P |
---|---|---|---|---|---|
BG | 1000 | 2 | 500 | 4.0 | 0.046 |
Error | 1500 | 12 | 125 | ||
Total | 2500 | 14 |
Step 4. Find the Critical Values
The critical value for Scheffé's test of comparison Li can be computed from the following formula:
___________________________________ | |
CVi = √ | (k - 1) F(v1, v2) MSE [ Σ(cj2 / nj ) ] |
where CVi is the critical value for comparison Li, (k - 1) is the between groups degrees of freedom, F(v1, v2) is the F value with v1, v2 degrees of freedom, v1 is degrees of freedom for the between groups factor, v2 is degrees of freedom for the mean square error, MSE is the mean square error, cj is a coefficient (weight) for treatment j in comparison Li, and nj is sample size in Group j.
To find values for the degrees of freedom and the mean squared error, refer to the ANOVA table. From the table, we see that v1 equals 2, v2 equals 12, and the mean squared error equals 125.
To find F(v1, v2), use Stat Trek's F Distribution Calculator. In the field for the numerator degrees of freedom, enter 2. In the field for the denominator degrees of freedom, enter 12. And in the field for P(F≤f), enter 1 - α which is 1 - 0.05 or 0.95; Then, click the Calculate button.
From the calculator, we see that F(2,12) equals about 3.89 when the significance level (α) is 0.05. At last, we have all the values we need to compute a critical value for each comparison:
_________________________________ | |
CVi = √ | (k - 1) F(v1, v2) MSE [ Σ(cj2 / nj ) ] |
___________________________ | |
CV1 = √ | 2 * 3.89 * 125 * (0.2 + 0.2) = 19.7 |
___________________________ | |
CV2 = √ | 2 * 3.89 * 125 * (0.2 + 0.2) = 19.7 |
___________________________ | |
CV3 = √ | 2 * 3.89 * 125 * (0.2 + 0.2) = 19.7 |
____________________________________ | |
CV4 = √ | 2 * 3.89 * 125 * (0.2 + 0.05 + 0.05) = 17 |
Step 5. Test Hypotheses
To test the statistical significance of each comparison, we compare the value of the comparison (Li from Step 2) with the critical value for the comparison (CVi from Step 4). If Li is bigger than CVi, the comparison is statistically significant.
Table 5 shows Scheffé test results for each comparison.
Table 5. Scheffé Test Results
Comparison | Li value | CVi value | Conclusion |
---|---|---|---|
X1 - X2 | 10 | 19.7 | Not significant |
X1 - X3 | 20 | 19.7 | Significant |
X2 - X3 | 10 | 19.7 | Not significant |
X1 - 0.5X2 - 0.5X3 | 15 | 17.0 | Not significant |
The second comparison is statistically significant, since L2 is bigger than CV2. The second comparison measures the difference between resting pulse rate in the control group (Group 1) and resting pulse rate in the high-effort group (Group 3). From this post hoc analysis, we conclude that the high-effort treatment has a significant effect on resting pulse rate.
None of the other comparisons are statistically significant.
Note: In a previous lesson, we tested the fourth comparison as part of a planned analysis and found it to be statistically significant. This illustrates the value of deciding in advance which comparisons to test. When the number of hypotheses tested is small, a priori tests (like the F ratio) tend to be more sensitive than post hoc tests (like the Scheffé test).