Ranks Based Tests

Author

Dr. Cohen

Wilcoxon rank sum Test or Mann-Whitney Test

  • 2 independent samples

Example 1 : Temperature data

Test that the mean high temperature of city 1 is greater than the mean high temperature of city 2.

temp_city1 = c(83,91,89,89,94,96,91,92,90) # X
temp_city2 = c(78,82,81,77,79,81,80,81) # Y
wilcox.test(temp_city1,
            temp_city2,
            alternative = "g",
            exact=FALSE)

    Wilcoxon rank sum test with continuity correction

data:  temp_city1 and temp_city2
W = 72, p-value = 0.0003033
alternative hypothesis: true location shift is greater than 0

Example 2

x = c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)
y = c(1.15, 0.88, 0.90, 0.74, 1.21)
wilcox.test(x, y, alternative = "g")

    Wilcoxon rank sum exact test

data:  x and y
W = 35, p-value = 0.1272
alternative hypothesis: true location shift is greater than 0

Wilcoxon Signed Rank Test

  • 2 dependent samples

Example 1 : Twin data

Test if the first born twin is more aggressive than the second born twin

FBT = c(86,71,77,68,91,72,77,91,70,71,88,87) # X
SBT = c(88,77,76,64,96,72,65,90,65,80,81,72) # Y
#in R D= X-Y
wilcox.test(SBT,FBT,
            paired=TRUE,
            alternative="l",
            exact = FALSE,
            conf.int=TRUE)

    Wilcoxon signed rank test with continuity correction

data:  SBT and FBT
V = 24.5, p-value = 0.2382
alternative hypothesis: true location shift is less than 0
95 percent confidence interval:
     -Inf 2.000018
sample estimates:
(pseudo)median 
     -1.823501 

Example

The cake batter is mixed until it reaches a specified level of consistency. The time required using two mixers A and B.

Find 95 confidence interval of the difference between two means.

# Time required 
A=c(7.3,6.9,7.2,7.8,7.2)
B=c(7.4,6.8,6.9,6.7,7.1)

wilcox.test(A,B,alternative = "t",conf.int = TRUE,exact=FALSE)

    Wilcoxon rank sum test with continuity correction

data:  A and B
W = 19.5, p-value = 0.1719
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.1999974  0.8999827
sample estimates:
difference in location 
             0.3000632 

Kruskal-Walis Test

  • K independent samples

Example 1 - Corn data

library(agricolae)
data(corn)
head(corn)
  method observation   rx
1      1          83 11.0
2      1          91 23.0
3      1          94 28.5
4      1          89 17.0
5      1          89 17.0
6      1          96 31.5
kruskal.test(observation~method,data=corn)

    Kruskal-Wallis rank sum test

data:  observation by method
Kruskal-Wallis chi-squared = 25.629, df = 3, p-value = 1.141e-05
Median.test(corn$observation,corn$method)

The Median Test for corn$observation ~ corn$method 

Chi Square = 17.54306   DF = 3   P.Value 0.00054637
Median = 89 

  Median  r Min Max   Q25   Q75
1   91.0  9  83  96 89.00 92.00
2   86.0 10  81  91 83.25 89.75
3   95.0  7  91 101 93.50 98.00
4   80.5  8  77  82 78.75 81.00

Post Hoc Analysis

Groups according to probability of treatment differences and alpha level.

Treatments with the same letter are not significantly different.

  corn$observation groups
3             95.0      a
1             91.0      b
2             86.0      b
4             80.5      c
## KW test is reject --> post hoc analysis
# install.packages(c("PMCMRplus", "PMCMR"))# install package is needed
library(PMCMRplus)
kwAllPairsConoverTest(corn$observation,corn$method,
                             p.adjust.method = "bon")
Warning in kwAllPairsConoverTest.default(corn$observation, corn$method, : Ties
are present. Quantiles were corrected for ties.

    Pairwise comparisons using Conover's all-pairs test
data: corn$observation and corn$method
  1       2       3      
2 0.04257 -       -      
3 0.02382 1.2e-05 -      
4 3.9e-07 0.00058 5.3e-10

P value adjustment method: bonferroni
kwAllPairsConoverTest(corn$observation,
                             corn$method,p.adjust.method = "none")
Warning in kwAllPairsConoverTest.default(corn$observation, corn$method, : Ties
are present. Quantiles were corrected for ties.

    Pairwise comparisons using Conover's all-pairs test

data: corn$observation and corn$method
  1       2       3      
2 0.0071  -       -      
3 0.0040  1.9e-06 -      
4 6.4e-08 9.7e-05 8.8e-11

P value adjustment method: none

Example 2 - Intructors data

3 instructors compared grades they assigned over the past semester:

Grades I 1 I 2 I 3
A 4 10 6
B 14 6 7
C 17 9 8
D 6 7 6
F 2 6 1

Do some instructors tend to give higher grades than other instructors?

The data here is ordinal (A to F). We can code A to be 5 and F=1

I1= c(rep(5,4),rep(4,14),rep(3,17),rep(2,6),rep(1,2)) 
I2= c(rep(5,10),rep(4,6),rep(3,9),rep(2,7),rep(1,6))
I3= c(rep(5,6),rep(4,7),rep(3,8),rep(2,6),rep(1,1))

grades = c(I1,I2,I3)
Instr = c(rep("I1",43),rep("I2",38),rep("I3",28))

kruskal.test(grades,Instr)

    Kruskal-Wallis rank sum test

data:  grades and Instr
Kruskal-Wallis chi-squared = 0.32093, df = 2, p-value = 0.8517
  • P-value = 0.8517 is greater than 5% then we fail to reject the null hypothesis. There is evidence to support that the grades assigned by the 3 instructors are not significantly different from each other.

Friedman Test

  • K dependent samples

Example 1 - Homeowner data - p. 371

12 homeowners are randomly selected to participate in an experiment with a plant nursery. Each homeowner was asked to select four fairly identical areas in his yard and to plant 4 different types of grasses.

At the end of the experiment, each homeowner was asked to rank the grass types in order of preference. The rank 1 was assigned to the least preferred.

The hypothesis was that there is no difference in preferences of the grass types.

library(tidyverse)

hw = (1:12) # homeowner id
grass1 = c(4,4,3,3,4,2,1,2,3.5,4,4,3.5) # preference for grass type 1 
grass2 = c(3,2,1.5,1,2,2,3,4,1,1,2,1)# preference for grass type 2 
grass3 = c(2,3,1.5,2,1,2,2,1,2,3,3,2)# preference for grass type 3 
grass4 = c(1,1,4,4,3,4,4,3,3.5,2,1,3.5)# preference for grass type 4 
df = data.frame(hw,grass1,grass2,grass3,grass4)

df1 = df %>% 
  pivot_longer(cols = -hw, names_to =  "GrassType", values_to = "preference") 

head(df1)
# A tibble: 6 × 3
     hw GrassType preference
  <int> <chr>          <dbl>
1     1 grass1             4
2     1 grass2             3
3     1 grass3             2
4     1 grass4             1
5     2 grass1             4
6     2 grass2             2
friedman.test(y=df1$preference,df1$GrassType,blocks=df1$hw)

    Friedman rank sum test

data:  df1$preference, df1$GrassType and df1$hw
Friedman chi-squared = 8.0973, df = 3, p-value = 0.04404

Results: Reject the null hypothesis. There is evidence to suggest that preference of grasses types are different among at least two homeowners.

Post hoc analysis
frdcomp = frdAllPairsConoverTest(y=df1$preference,
                                 df1$GrassType,
                                 blocks=df1$hw,
                                 p.adjust.method = "bon")
frdcomp

    Pairwise comparisons using Conover's all-pairs test for a two-way balanced complete block design
data: y, groups and blocks
       grass1 grass2 grass3
grass2 0.15   -      -     
grass3 0.22   1.00   -     
grass4 1.00   0.60   0.81  

P value adjustment method: bonferroni
boxplot(preference~GrassType, data= df1)

Multiple comparison with the Bonferroni correction shows no significance difference. We may conclude the significance results from the Friedman test may be a false positive.

Correlation coefficient

12 MBA students are studied to measure the strength of the relationship between their GMAT and GPA.

gmat=c(710,610,640,580,545,560,610,530,560,540,570,560)
gpa=c(4,4,3.9,3.8,3.7,3.6,3.5,3.5,3.5,3.3,3.2,3.2)

cor.test(gmat,gpa,method = "s",alternative = "t")
Warning in cor.test.default(gmat, gpa, method = "s", alternative = "t"): Cannot
compute exact p-value with ties

    Spearman's rank correlation rho

data:  gmat and gpa
S = 117.25, p-value = 0.04344
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.5900188 
cor.test(gmat,gpa,method = "k",alternative = "t")
Warning in cor.test.default(gmat, gpa, method = "k", alternative = "t"): Cannot
compute exact p-value with ties

    Kendall's rank correlation tau

data:  gmat and gpa
z = 1.8967, p-value = 0.05787
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.4390389 
cor.test(gmat,gpa,method = "p",alternative = "t")

    Pearson's product-moment correlation

data:  gmat and gpa
t = 2.8004, df = 10, p-value = 0.01878
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.1437660 0.8959716
sample estimates:
      cor 
0.6629678