Treatments: Control, Low, High (invasion intensity)
Variable: Insect abundance (# of insects present)
my_data <- read.csv("insect_abundance_earthworm_invasion.csv",header=TRUE,sep=",")
str(my_data)
## 'data.frame': 30 obs. of 3 variables:
## $ Control: int 74 50 69 45 56 49 35 57 78 80 ...
## $ Low : int 56 67 45 67 78 98 45 57 87 34 ...
## $ High : int 23 32 34 32 34 45 13 42 53 12 ...
summary(my_data)
## Control Low High
## Min. :35.00 Min. :23.00 Min. :10.00
## 1st Qu.:57.00 1st Qu.:34.50 1st Qu.:21.75
## Median :68.00 Median :45.00 Median :32.00
## Mean :68.37 Mean :51.15 Mean :30.46
## 3rd Qu.:83.00 3rd Qu.:65.00 3rd Qu.:39.75
## Max. :97.00 Max. :98.00 Max. :53.00
## NA's :3 NA's :6
var(my_data$Control)
## [1] 274.792
var(my_data$Low, na.rm = TRUE)
## [1] 362.8234
var(my_data$High, na.rm = TRUE)
## [1] 164.9547
sd(my_data$Control)
## [1] 16.57685
sd(my_data$Low, na.rm = TRUE)
## [1] 19.04792
sd(my_data$High, na.rm = TRUE)
## [1] 12.84347
Control: Sample size = 30, Mean = 68.37, Variance = 274.792, SD = 16.57685
Low: Sample size = 27, Mean = 51.15, Variance = 362.8234, SD = 19.04792
High: Sample size = 25, Mean = 30.46, Variance = 164.9547, SD = 12.84347
# creating data table
nGroup <- 3 # number of treatment groups
nName <- c("Control","Low", "High") # names of groups
nSize <- c(30,27,25) # number of observations in each group
nMean <- c(68.37,51.15,30.46) # mean of each group
nSD <- c(16.58,19.05,12.84) # standard deviation of each group
ID <- 1:(sum(nSize)) # id vector for each row
Abundance <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Treatment <- rep(nName,nSize)
ANOdata <- data.frame(ID,Treatment,Abundance)
str(ANOdata)
## 'data.frame': 82 obs. of 3 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Treatment: chr "Control" "Control" "Control" "Control" ...
## $ Abundance: num 100.1 88.7 94.7 103 68.8 ...
print(ANOdata)
## ID Treatment Abundance
## 1 1 Control 100.08390
## 2 2 Control 88.74780
## 3 3 Control 94.71581
## 4 4 Control 103.00641
## 5 5 Control 68.77967
## 6 6 Control 89.55094
## 7 7 Control 57.49989
## 8 8 Control 69.15121
## 9 9 Control 74.32657
## 10 10 Control 45.99579
## 11 11 Control 83.62930
## 12 12 Control 93.95153
## 13 13 Control 42.68466
## 14 14 Control 61.47345
## 15 15 Control 63.60675
## 16 16 Control 72.65832
## 17 17 Control 66.95870
## 18 18 Control 67.54365
## 19 19 Control 67.05332
## 20 20 Control 66.85924
## 21 21 Control 60.33262
## 22 22 Control 48.61222
## 23 23 Control 55.00196
## 24 24 Control 51.42227
## 25 25 Control 64.78489
## 26 26 Control 64.75341
## 27 27 Control 52.08256
## 28 28 Control 105.18089
## 29 29 Control 76.27160
## 30 30 Control 57.23531
## 31 31 Low 68.22492
## 32 32 Low 54.14860
## 33 33 Low 36.92846
## 34 34 Low 50.56000
## 35 35 Low 61.57487
## 36 36 Low 84.27558
## 37 37 Low 25.72168
## 38 38 Low 63.32460
## 39 39 Low 51.33056
## 40 40 Low 56.03900
## 41 41 Low 23.78432
## 42 42 Low 85.94737
## 43 43 Low 70.74214
## 44 44 Low 85.79916
## 45 45 Low 68.63415
## 46 46 Low 78.08207
## 47 47 Low 66.46903
## 48 48 Low 67.08882
## 49 49 Low 56.56022
## 50 50 Low 113.58766
## 51 51 Low 46.20822
## 52 52 Low 55.03833
## 53 53 Low 35.65834
## 54 54 Low 38.26818
## 55 55 Low 26.17867
## 56 56 Low 74.00803
## 57 57 Low 39.08951
## 58 58 High 14.43213
## 59 59 High 36.54602
## 60 60 High 45.10601
## 61 61 High 48.37031
## 62 62 High 35.95813
## 63 63 High 22.64326
## 64 64 High 28.93933
## 65 65 High 25.18287
## 66 66 High 30.06778
## 67 67 High 41.91877
## 68 68 High 35.22264
## 69 69 High 37.16639
## 70 70 High 10.38941
## 71 71 High 16.18895
## 72 72 High 49.06463
## 73 73 High 15.69114
## 74 74 High 31.60111
## 75 75 High 31.53490
## 76 76 High 22.90306
## 77 77 High 34.74800
## 78 78 High 40.72921
## 79 79 High 37.15754
## 80 80 High 33.36239
## 81 81 High 42.53454
## 82 82 High 38.83720
Since I want to compare differences among three different variables, I will be using a one-way ANOVA. The independent variable (treatment) is discrete and the dependent variable (abundance) is continuous.
library(tidyverse)
# Basic ANOVA in R
abundanceAOV <- aov(Abundance~Treatment,data=ANOdata)
print(abundanceAOV)
## Call:
## aov(formula = Abundance ~ Treatment, data = ANOdata)
##
## Terms:
## Treatment Residuals
## Sum of Squares 20469.29 23117.55
## Deg. of Freedom 2 79
##
## Residual standard error: 17.10635
## Estimated effects may be unbalanced
print(summary(abundanceAOV))
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 2 20469 10235 34.98 1.32e-11 ***
## Residuals 79 23118 293
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# graph for ANOVA
ANOPlot <- ggplot(data=ANOdata,aes(x=Treatment,y=Abundance,fill=Treatment)) +
geom_boxplot()+
theme_bw()
print(ANOPlot)
Rerunning the code from questions 3 and 4 provides a different, but similar conclusion every time. There is variability within the dataset and the boxplots in the graph are different size, but there is always a significant difference among treatment.
# creating data table with new/adjusted means
nGroup <- 3 # number of treatment groups
nName <- c("Control","Low", "High") # names of groups
nSize <- c(30,27,25) # number of observations in each group
nMean <- c(50,45,39) # mean of each group
nSD <- c(16.58,19.05,12.84) # standard deviation of each group
ID <- 1:(sum(nSize)) # id vector for each row
Abundance <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Treatment <- rep(nName,nSize)
ANOdata_adjmeans <- data.frame(ID,Treatment,Abundance)
str(ANOdata_adjmeans)
## 'data.frame': 82 obs. of 3 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Treatment: chr "Control" "Control" "Control" "Control" ...
## $ Abundance: num 10.1 55.9 61.8 54.6 61.7 ...
print(ANOdata_adjmeans)
## ID Treatment Abundance
## 1 1 Control 10.126942
## 2 2 Control 55.886228
## 3 3 Control 61.793226
## 4 4 Control 54.556022
## 5 5 Control 61.739301
## 6 6 Control 48.730382
## 7 7 Control 56.598829
## 8 8 Control 87.348901
## 9 9 Control 35.283963
## 10 10 Control 64.971768
## 11 11 Control 55.978959
## 12 12 Control 47.357390
## 13 13 Control 66.314794
## 14 14 Control 47.172185
## 15 15 Control 42.283649
## 16 16 Control 52.038942
## 17 17 Control 67.968300
## 18 18 Control 54.674130
## 19 19 Control 44.870380
## 20 20 Control 60.450968
## 21 21 Control 39.078994
## 22 22 Control 51.170394
## 23 23 Control 30.802062
## 24 24 Control 68.756830
## 25 25 Control 60.881884
## 26 26 Control 48.013707
## 27 27 Control 43.910754
## 28 28 Control 20.219878
## 29 29 Control 65.095964
## 30 30 Control 43.000610
## 31 31 Low 52.407370
## 32 32 Low 41.453432
## 33 33 Low 65.579034
## 34 34 Low 54.979053
## 35 35 Low 24.622450
## 36 36 Low 29.027672
## 37 37 Low 23.243961
## 38 38 Low 94.595412
## 39 39 Low 58.887979
## 40 40 Low 58.720452
## 41 41 Low 52.032390
## 42 42 Low 22.254370
## 43 43 Low 52.958933
## 44 44 Low 43.001060
## 45 45 Low 68.092041
## 46 46 Low 33.660315
## 47 47 Low 56.797848
## 48 48 Low 43.361099
## 49 49 Low 52.535683
## 50 50 Low 55.064336
## 51 51 Low 12.164619
## 52 52 Low 35.781797
## 53 53 Low 41.061489
## 54 54 Low 8.534351
## 55 55 Low 40.166816
## 56 56 Low 53.401127
## 57 57 Low 63.454156
## 58 58 High 26.772947
## 59 59 High 58.560665
## 60 60 High 37.026120
## 61 61 High 35.130926
## 62 62 High 28.467705
## 63 63 High 32.719520
## 64 64 High 37.925062
## 65 65 High 32.433904
## 66 66 High 18.575099
## 67 67 High 38.188226
## 68 68 High 58.215165
## 69 69 High 36.494967
## 70 70 High 39.299961
## 71 71 High 46.056919
## 72 72 High 38.306512
## 73 73 High 44.380383
## 74 74 High 40.277770
## 75 75 High 37.946671
## 76 76 High 46.249274
## 77 77 High 37.583939
## 78 78 High 22.777028
## 79 79 High 37.909697
## 80 80 High 59.122862
## 81 81 High 53.207288
## 82 82 High 58.871139
# ANOVA results for adjusted data
abundanceAOV_adjmeans <- aov(Abundance~Treatment,data=ANOdata_adjmeans)
print(abundanceAOV_adjmeans)
## Call:
## aov(formula = Abundance ~ Treatment, data = ANOdata_adjmeans)
##
## Terms:
## Treatment Residuals
## Sum of Squares 1798.48 18760.03
## Deg. of Freedom 2 79
##
## Residual standard error: 15.41002
## Estimated effects may be unbalanced
print(summary(abundanceAOV_adjmeans))
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 2 1798 899.2 3.787 0.0269 *
## Residuals 79 18760 237.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The original means were: Control: 68.37, Low: 51.15, High: 30.46 (*** significance).
When I adjusted them to be Control: 50, Low: 45, High: 40, the results were not significant. I then adjusted them again at Control: 50, Low: 45, High: 39 and the results showed significance (*significance). Compared to the original means, the new means are much closer together while still being able to detect significance.
# creating new dataset with smaller sample size
nGroup <- 3 # number of treatment groups
nName <- c("Control","Low", "High") # names of groups
nSize <- c(5,5,5) # number of observations in each group
nMean <- c(68.37,51.15,30.46) # mean of each group
nSD <- c(16.58,19.05,12.84) # standard deviation of each group
ID <- 1:(sum(nSize)) # id vector for each row
Abundance <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Treatment <- rep(nName,nSize)
ANOdata_adjss <- data.frame(ID,Treatment,Abundance)
str(ANOdata_adjss)
## 'data.frame': 15 obs. of 3 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Treatment: chr "Control" "Control" "Control" "Control" ...
## $ Abundance: num 62.1 56.3 79.2 64.3 52.7 ...
print(ANOdata_adjss)
## ID Treatment Abundance
## 1 1 Control 62.06732
## 2 2 Control 56.32211
## 3 3 Control 79.24600
## 4 4 Control 64.26977
## 5 5 Control 52.68688
## 6 6 Low 55.53522
## 7 7 Low 78.41318
## 8 8 Low 34.41172
## 9 9 Low 65.51155
## 10 10 Low 52.82815
## 11 11 High 31.91117
## 12 12 High 32.27341
## 13 13 High 26.81920
## 14 14 High 39.74430
## 15 15 High 53.04779
# ANOVA results for adjusted sample size data
abundanceAOV_adjss <- aov(Abundance~Treatment,data=ANOdata_adjss)
print(abundanceAOV_adjss)
## Call:
## aov(formula = Abundance ~ Treatment, data = ANOdata_adjss)
##
## Terms:
## Treatment Residuals
## Sum of Squares 1898.323 1894.167
## Deg. of Freedom 2 12
##
## Residual standard error: 12.56373
## Estimated effects may be unbalanced
print(summary(abundanceAOV_adjss))
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 2 1898 949.2 6.013 0.0155 *
## Residuals 12 1894 157.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First, I made all treatments have equal sample size at n=25 (sig diff detected).
All sample size n=20, still sig diff.
All sample size n=15, still sig diff.
All sample size n=10, still sig diff.
All sample size n=5, still sig diff. this is the smallest sample size to produce a statistical difference
All sample size n=3, no sig diff.
All sample size n=4, no sig diff.