Tag: Normality testing

  • Shapiro-Wilk Test in R

    Null hypothesis of the Shapiro-Wilk test

    The Shapiro-Wilk test checks the variable to be tested for normal distribution, hence the null hypothesis is:
    The variable is normally distributed.

    Conducting the Shapiro-Wilk Test in R

    The Shapiro-Wilk test is calculated in R using the shapiro.test() function. Only the variable to be tested for deviation from a normal distribution is required.

    The Shapiro-Wilk test for my variable to be tested, “weight” from the data frame “df”, looks like this:

    shapiro.test(df$weight)

    The result is a test statistic W and the p-value:

    	Shapiro-Wilk normality test
    
    data:  df$weight
    W = 0.85935, p-value = 2.38e-05

    Interpreting the Shapiro-Wilk test in R

    The p-value is used to test the null hypothesis. If the p-value is below the predetermined alpha level (usually 5% = 0.05), the null hypothesis is rejected. We recall that the null hypothesis of the Shapiro-Wilk test is synonymous with normal distribution.

    Consequently, rejecting the null hypothesis means that no normal distribution can be observed. This applies to the example calculated above.

    The p-value is very small at p = 2.38e-05. The decimal point moves 5 places to the left. 2.38e-05 = 0.0000238 and is therefore clearly < 0.05. The null hypothesis of normal distribution must therefore be rejected.

    When looking at a Q-Q plot and/or histogram (see below), it quickly becomes clear that the Shapiro-Wilk test seems to be correct. The deviations from a normal distribution are too large.

    hist(df$weight)
    qqnorm(df$weight)
    qqline(df$weight)

    A second example

    shapiro.test(df2$x)

    Which leads to:

    	Shapiro-Wilk normality test
    
    data:  df2$x
    W = 0.99958, p-value = 0.3661

    The null hypothesis is not rejected for this distribution (intentionally created in advance as a normally distributed variable) with 5000 observations with a p-value of p = 0.366. The test variable is normally distributed, which is confirmed when looking at the Q-Q plot and histogram:

    Caution using the Shapiro-Wilk-test

    A final example serves as a warning, solely using the Shapiro-Wilk test. A variable, analogous to above, with 5000 observations, but with a few observations at the distribution edges around 3 or -3 and 4 or -4.

    	Shapiro-Wilk normality test
    
    data:  df2$x
    W = 0.99297, p-value = 5.227e-15

    The result here is clear: a rejection of the null hypothesis of normal distribution. However, the question arises as to whether this is justified when the Q-Q plot and histogram are considered:

    It can be concluded here that, although the Shapiro-Wilk test does not indicate a normal distribution here (at any alpha level), graphically there is hardly any reason to doubt normal distribution.


    The deviations visible in the Q-Q plot at the ends of the distribution are normal in the truest sense of the word from a practical point of view. In such cases, the typical requirement of parametric tests for e.g. normally distributed residuals is fulfilled, as an “approximately normal distribution” is sufficient – although the Shapiro-Wilk test does not provide proof for a normal distribution.

    • The reason for this test result is the larger sample size.
    • This leads to a higher power (= test strength) of the Shapiro-Wilk test.
    • A higher power means that smaller deviations are more likely to be detected.
    • Small deviations in large samples therefore ultimately lead to the rejection of the null hypothesis (of normal distribution)

    The article “The large sample size fallacy”, Lantz (2013) describes this extensively:
    Summarized from the abstract: “The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article.

    Literature

    • Lantz, B. (2013). The large sample size fallacy. Scandinavian journal of caring sciences, 27(2), 487-492.