Scenario: A research team is doing an experiment to assess whether there is a significant difference between the control and treatment group in the post test ratings of the students. The number of participants for each group were determined via sample size estimation using power and mean, and randomization is considered in the assignment of groups.
library(xlsx)
data <- read.xlsx("Experiment1.xlsx", sheetIndex = "Sheet1")
head(data)
## Pre Post Group
## 1 2.03 17.23 Cntrl
## 2 4.02 16.04 Cntrl
## 3 14.34 19.22 Cntrl
## 4 15.55 19.45 Cntrl
## 5 2.05 18.53 Cntrl
## 6 11.07 21.00 Cntrl
tail(data)
## Pre Post Group
## 69 17.32 19.67 Tx
## 70 8.95 6.86 Tx
## 71 8.34 17.63 Tx
## 72 17.52 18.84 Tx
## 73 5.21 15.11 Tx
## 74 3.73 19.07 Tx
str(data)
## 'data.frame': 74 obs. of 3 variables:
## $ Pre : num 2.03 4.02 14.34 15.55 2.05 ...
## $ Post : num 17.2 16 19.2 19.4 18.5 ...
## $ Group: chr "Cntrl" "Cntrl" "Cntrl" "Cntrl" ...
summary(data)
## Pre Post Group
## Min. : 0.37 Min. : 5.28 Length:74
## 1st Qu.: 5.48 1st Qu.:16.05 Class :character
## Median :11.67 Median :18.20 Mode :character
## Mean :10.93 Mean :16.92
## 3rd Qu.:15.74 3rd Qu.:19.16
## Max. :19.77 Max. :21.04
#Retrieve Pretest ratings from Control Group
CntrlPre <- data[data$Group =="Cntrl",]$Pre
#Retrieve Pretest ratings from Treatment Group
TxPre <- data[data$Group == "Tx",]$Pre
#Retrieve PostTest ratings from Control Group
CntrlPost <- data[data$Group == "Cntrl",]$Post
#Retrive PostTest ratings from Treatment Group
TxPost <- data[data$Group == "Tx",]$Post
library(nortest)
nordata <- cbind(CntrlPre, CntrlPost, TxPre, TxPost)
apply(nordata, 2, function(x) ad.test(x))
## $CntrlPre
##
## Anderson-Darling normality test
##
## data: x
## A = 0.75109, p-value = 0.04608
##
##
## $CntrlPost
##
## Anderson-Darling normality test
##
## data: x
## A = 2.4165, p-value = 3.078e-06
##
##
## $TxPre
##
## Anderson-Darling normality test
##
## data: x
## A = 0.83981, p-value = 0.02754
##
##
## $TxPost
##
## Anderson-Darling normality test
##
## data: x
## A = 1.9828, p-value = 3.726e-05
par(mfrow=c(2,2))
apply(nordata, 2, function(x) plot(density(x), col = "firebrick"))
## NULL
# Checking both the statistics and visualizations, it seems that the data set is not approximately normally distributed.
library(psych)
describeBy(data$Pre, data$Group) #Median is the measure for central tendency given the result of normality test
##
## Descriptive statistics by group
## group: Cntrl
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 37 11.48 5.92 11.65 11.68 8.35 1.37 19.7 18.33 -0.25 -1.33 0.97
## ------------------------------------------------------------
## group: Tx
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 37 10.37 5.79 11.69 10.35 7.99 0.37 19.77 19.4 -0.05 -1.43 0.95
describeBy(data$Post, data$Group) #Median is the measure for central tendency given the result of normality test
##
## Descriptive statistics by group
## group: Cntrl
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 37 17.23 3.52 19.05 17.67 1.44 5.28 21.04 15.76 -1.47 1.77 0.58
## ------------------------------------------------------------
## group: Tx
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 37 16.62 3.17 17.63 17.03 1.96 6.86 20.83 13.97 -1.32 1.1 0.52
wilcox.test(CntrlPre, jitter(TxPre), alternative = "two.sided", paired = FALSE) # There is no significant difference in the pretest ratings between Control and Treatment. This is consistent with the expectations of an experimental design.
##
## Wilcoxon rank sum exact test
##
## data: CntrlPre and jitter(TxPre)
## W = 758, p-value = 0.4323
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(CntrlPre, jitter(CntrlPost), alternative = "two.sided", paired = TRUE) # Presence of significant difference in pretest and post test ratings of the Control Group indicate that an increase in ratings in the Post Test may be attributed by other factors beside chance. This makes the findings rather intriguing considering the change in ratings despite the fact that it occured in the Control Group; one that was not introduced with Treatment.
##
## Wilcoxon signed rank exact test
##
## data: CntrlPre and jitter(CntrlPost)
## V = 31, p-value = 3.456e-08
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(TxPre, jitter(TxPost), alternative = "two.sided", paired = TRUE) # There exists a significant difference in the pretest and post test ratings for the Treatment group which somehow indicates that the change may be related to the introduction of the treatment rather than chance. However, this is questionable considering the observation in the Control group wherein there is an increase in ratings despite the fact that there was no treatment introduced.
##
## Wilcoxon signed rank exact test
##
## data: TxPre and jitter(TxPost)
## V = 25, p-value = 1.315e-08
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(CntrlPost, jitter(TxPost), alternative = "two.sided", paired = FALSE) # When comparing the Post Test ratings of Control and Treatment Group, no significant difference is observed. This further explains the observations made earlier.
##
## Wilcoxon rank sum exact test
##
## data: CntrlPost and jitter(TxPost)
## W = 842, p-value = 0.08972
## alternative hypothesis: true location shift is not equal to 0