Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics

How to Test for Normality: Three Simple Tests

Many statistical techniques (regression, ANOVA, t-tests, etc.) rely on the assumption that data is normally distributed. For these techniques, it is good practice to examine the data to confirm that the assumption of normality is tenable.

With that in mind, here are three simple ways to test interval-scale data or ratio-scale data for normality.

Each option is easy to implement with Excel, as long as you have Excel's Analysis ToolPak.

The Analysis ToolPak

To conduct the tests for normality described below, you need a free Microsoft add-in called the Analysis ToolPak, which may or may not be already installed on your copy of Excel.

To determine whether you have the Analysis ToolPak, click the Data tab in the main Excel menu. If you see Data Analysis in the Analysis section, you're good. You have the ToolPak.

Excel Analysis ToolPak
Excel Analysis ToolPak

If you don't have the ToolPak, you need to get it. Go to: How to Install the Data Analysis ToolPak in Excel.

Descriptive Statistics

Perhaps, the easiest way to test for normality is to examine several common descriptive statistics. Here's what to look for:

  • Central tendency. The mean and the median are summary measures used to describe central tendency - the most "typical" value in a set of values. With a normal distribution, the mean is equal to the median.
  • Skewness. Skewness is a measure of the asymmetry of a probability distribution. If observations are equally distributed around the mean, the skewness value is zero; otherwise, the skewness value is positive or negative. As a rule of thumb, skewness between -2 and +2 is consistent with a normal distribution.
  • Kurtosis. Kurtosis is a measure of whether observations cluster around the mean of the distribution or in the tails of the distribution. The normal distribution has a kurtosis value of zero. As a rule of thumb, kurtosis between -2 and +2 is consistent with a normal distribution.

Together, these descriptive measures provide a useful basis for judging whether a data set satisfies the assumption of normality.

Example 1

To see how to compute descriptive statistics in Excel, consider the following data set:

Sample data
1
2
3
3
3.5
4
4
5
5
5
5.5
5.5
6
6
6
6.5
7
7
8
9

Begin by entering data in a column or row of an Excel spreadsheet:

Data in Excel spreadsheet

Next, from the navigation menu in Excel, click Data / Data analysis. That displays the Data Analysis dialog box. From the Data Analysis dialog box, select Descriptive Statistics and click the OK button:

Data Analysis dialog box

Then, in the Descriptive Statistics dialog box, enter the input range, and click the Summary Statistics check box. The dialog box, with entries, should look like this:

Descriptive Statistics dialog box

And finally, to display summary statistics, click the OK button on the Descriptive Statistics dialog box. Among other outputs, you should see the following:

Excel output

The mean is nearly equal to the median. And both skewness and kurtosis are between -2 and +2.

Conclusion: These descriptive statistics are consistent with a normal distribution.

Histogram

Another easy way to test for normality is to plot data in a histogram, and see if the histogram reveals the bell-shaped pattern characteristic of a normal distribution. With Excel, this is a a four-step process:

  • Enter data. This means entering data values in an Excel spreadsheet. The column, row, or range of cells that holds data is the input range.
  • Define bins. In Excel, bins are category ranges. To define a bin, you enter the upper range of each bin in a column, row, or range of cells. The block of cells that holds upper-range entries is called the bin range.
  • Plot the data in a histogram. In Excel, access the histogram function through: Data / Data analysis / Histogram.
  • In the Histogram dialog box, enter the input range and the bin range; and check the Chart Output box. Then, click OK.

If the resulting histogram looks like a bell-shaped curve, your work is done. The data set is normal or nearly normal. If the curve is not bell-shaped, the data may not be normal.

Example 2

To see how to plot data for normality with a histogram in Excel, we'll use the same data set (shown below) that we used in Example 1.

Sample data
1
2
3
3
3.5
4
4
5
5
5
5.5
5.5
6
6
6
6.5
7
7
8
9

Begin by entering data to define an input range and a bin range. Here is what data entry looks like in an Excel spreadsheet:

Data in Excel spreadsheet

Next, from the navigation menu in Excel, click Data / Data analysis. That displays the Data Analysis dialog box. From the Data Analysis dialog box, select Histogram and click the OK button:

Data Analysis dialog box

Then, in the Histogram dialog box, enter the input range, enter the bin range, and click the Chart Output check box. The dialog box, with entries, should look like this:

Histogram dialog box

And finally, to display the histogram, click the OK button on the Histogram dialog box. Here is what you should see:

Excel output

The plot is fairly bell-shaped - an almost-symmetric pattern with one peak in the middle. Given this result, it would be safe to assume that the data were drawn from a normal distribution. On the other hand, if the plot were not bell-shaped, you might suspect the data were not from a normal distribution.

Chi-Square Test

The chi-square test for normality is another good option for determining whether a set of data was sampled from a normal distribution.

Note: All chi-square tests assume that the data under investigation was sampled randomly.

Hypothesis Testing

The chi-square test for normality is an actual hypothesis test, where we examine observed data to choose between two statistical hypotheses:

  • Null hypothesis: Data is sampled from a normal distribution.
  • Alternative hypothesis: Data is not sampled from a normal distribution.

Like many other techniques for testing hypotheses, the chi-square test for normality involves computing a test-statistic and finding the P-value for the test statistic, given degrees of freedom and significance level. If the P-value is bigger than the significance level, we accept the null hypothesis; if it is smaller, we reject the null hypothesis.

How to Conduct the Chi-Square Test

The steps required to conduct a chi-square test of normality are listed below:

  • Specify the significance level.
  • Find the mean, standard deviation, sample size for the sample.
  • Define non-overlapping bins.
  • Count observations in each bin, based on actual dependent variable scores.
  • Find the cumulative probability for each bin endpoint.
  • Find the probability that an observation would land in each bin, assuming a normal distribution.
  • Find the expected number of observations in each bin, assuming a normal distribution.
  • Compute a chi-square statistic.
  • Find the degrees of freedom, based on the number of bins.
  • Find the P-value for the chi-square statistic, based on degrees of freedom.
  • Accept or reject the null hypothesis, based on P-value and significance level.

So you will understand how to accomplish each step, let's work through an example, one step at a time.

Example 3

To demonstrate how to conduct a chi-square test for normality in Excel, we'll use the same data set (shown below) that we've used for the previous two examples. Here it is again:

Sample data
1
2
3
3
3.5
4
4
5
5
5
5.5
5.5
6
6
6
6.5
7
7
8
9

Now, using this data, let's check for normality.

Specify Significance Level

The significance level is the probability of rejecting the null hypothesis when it is true. Researchers often choose 0.05 or 0.01 for a significance level. For the purpose of this exercise, let's choose 0.05.

Find the Mean, Standard Deviation, and Sample Size

To compute a chi-square test statistic, we need to know the mean, standared deviation, and sample size. Excel can provide this information. Here's how:

  • Step 1. Enter data in a column or row of an Excel spreadsheet, like this:
    Excel data
  • Step 2. From the navigation menu in Excel, click Data / Data analysis. That displays the Data Analysis dialog box. From the Data Analysis dialog box, select Descriptive Statistics and click the OK button:
    Data Analysis dialog box
  • Step 3. In the Descriptive Statistics dialog box, enter the input range, and click the Summary Statistics check box. The dialog box, with entries, should look like this:
    Descriptive Statistics dialog box
  • Step 4. To display the mean, standard deviation, and sample size, click the OK button on the Descriptive Statistics dialog box. Among other outputs, you should see the following:
    Excel output

Define Bins

To conduct a chi-square analysis, we need to define bins, based on dependent variable scores. Each bin is defined by a non-overlapping range of values.

For the chi-square test to be valid, each bin should hold at least five observations. With that in mind, we'll define four bins for this example, as shown below:

Bin 1 will hold dependent variable scores that are less than 4; Bin 2, scores between 4 and 5; Bin 3, scores between 5.1 and 6; and and Bin 4, scores greater than 6.

Note: The number of bins is an arbitrary decision made by the experimenter, as long as the experimenter chooses at least four bins and at least five observations per bin. With fewer than four bins, there are not enough degrees of freedom for the analysis. For this example, we chose to define only four bins. Given the small sample, if we used more bins, at least one bin would have fewer than five observations per bin.

Count Observed Data Points in Each Bin

The next step is to count the observed data points in each bin. The figure below shows sample observations allocated to bins, with a frequency count for each bin in the final row.

Three bins showing sample allocation rules

Note: We have five observed data points in each bin - the minimum required for a valid chi-square test of normality.

Find Cumulative Probability

A cumulative probability refers to the probability that a random variable is less than or equal to a specific value. In Excel, the NORMDIST function computes cumulative probabilities from a normal distribution.

Assuming our data follows a normal distribution, we can use the NORMDIST function to find cumulative probabilities for the upper endpoints in each bin. Here is the formula we use:

P j = NORMDIST (MAX j , X, s, TRUE)

where P j is the cumulative probability for the upper endpoint in Bin j , MAX j is the upper endpoint for Bin j , X is the mean of the data set, and s is the standard deviation of the data set.

When we execute the formula in Excel, we get the following results:

P 1 = NORMDIST (4, 5.1, 2.0, TRUE) = 0.29

P 2 = NORMDIST (5, 5.1, 2.0, TRUE) = 0.48

P 3 = NORMDIST (6, 5.1, 2.0, TRUE) = 0.67

P 4 = NORMDIST (999999999, 5.1, 2.0, TRUE) = 1.00

Note: For Bin 4, the upper endpoint is positive infinity (∞), a quantity that is too large to be represented in an Excel function. To estimate cumulative probability for Bin 4 (P4) with excel, you can use a very large number (e.g., 999999999) in place of positive infinity (as shown above). Or you can recognize that the probability that any random variable is less than or equal to positive infinity is 1.00.

Find Bin Probability

Given the cumulative probabilities shown above, it is possible to find the probability that a randomly selected observation would fall in each bin, using the following formulas:

P( Bin = 1 ) = P 1 = 0.29

P( Bin = 2 ) = P 2 - P 1 = 0.48 - 0.29 = 0.19

P( Bin = 3 ) = P 3 - P 2 = 0.67 - 0.48 = 0.19

P( Bin = 4 ) = P 4 - P 3 = 1.000 - 0.67 = 0.33

Find Expected Number of Observations

Assuming a normal distribution, the expected number of observations in each bin can be found by using the following formula:

Exp j = P( Bin = j ) * n

where Exp j is the expected number of observations in Bin j , P( Bin = j ) is the probability that a randomly selected observation would fall in Bin j , and n is the sample size

Applying the above formula to each bin, we get the following:

Exp 1 = P( Bin = 1 ) * 20 = 0.29 * 20 = 5.8

Exp 2 = P( Bin = 2 ) * 20 = 0.19 * 20 = 3.8

Exp 3 = P( Bin = 3 ) * 20 = 0.19 * 20 = 3.8

Exp 3 = P( Bin = 4 ) * 20 = 0.33 * 20 = 6.6

Compute Chi-Square Statistic

Finally, we can compute the chi-square statistic ( χ2 ), using the following formula:

χ2 = Σ [ ( Obs j - Exp j ) 2 / Exp j ]

where Obs j is the observed number of observations in Bin j , and Exp j is the expected number of observations in Bin j .

Therefore,

χ2 =
( 5-5.8 )2

5.8
+
( 5-3.8 )2

3.8
+
( 5-3.8 )2

3.8
+
( 5-6.6 )2

6.6
χ2 = 1.26

Find Degrees of Freedom

Assuming a normal distribution, the degrees of freedom (df) for a chi-square test of normality equals the number of bins (nb) minus the number of estimated parameters (np) minus one. We used four bins, so nb equals four. And to conduct this analysis, we estimated two parameters (the mean and the standard deviation), so np equals two. Therefore,

df = nb - np - 1 = 4 - 2 - 1 = 1

Find P-Value

The P-value is the probability of seeing a chi-square test statistic that is more extreme (bigger) than the observed chi-square statistic. For this problem, we found that the observed chi-square statistic was 1.26. Therefore, we want to know the probability of seeing a chi-square test statistic bigger than 1.26, given one degree of freedom.

Use Stat Trek's Chi-Square Calculator to find that probability. Enter the degrees of freedom (1) and the observed chi-square statistic (1.26) into the calculator; then, click the Calculate button.

Chi-square calculator

From the calculator, we see that P( X2 > 1.26 ) equals 0.26.

Test Null Hypothesis

When the P-Value is bigger than the significance level, we cannot reject the null hypothesis. Here, the P-Value (0.26) is bigger than the significance level (0.05), so we cannot reject the null hypothesis that the data tested follows a normal distribution.