For Loops and Randomization Tests

1. Using a for loop, write a function to calculate the number of zeroes in a numeric vector. Before entering the loop, set up a counter variable counter <- 0. Inside the loop, add 1 to counter each time you have a zero in the vector. Finally, use return(counter) for the output.

z <- rbinom(30, 1, 0.5)
z
##  [1] 1 1 0 1 1 0 0 0 0 1 1 1 0 1 0 1 0 1 1 0 0 0 1 0 0 1 0 0 1 0
counter <- 0
  
for (i in seq_along(z)){
    if(z[i] == 0) {
      counter <- counter + 1 
    }
  }
print(counter)
## [1] 16

2. Use subsetting instead of a loop to rewrite the function as a single line of code.

length(z[z==0])
## [1] 16

3. Write a function that takes as input two integers representing the number of rows and columns in a matrix. The output is a matrix of these dimensions in which each element is the product of the row number x the column number.

matrixfunction <- function(row = 2, col = 3){
    df <- expand.grid(x=1:row, y=1:col)
    print(df)
    mvec <- df$x*df$y
    #print(mvec)
    matrix <- matrix(mvec, nrow=row, ncol=col)
    print(matrix)
}

matrixfunction()
##   x y
## 1 1 1
## 2 2 1
## 3 1 2
## 4 2 2
## 5 1 3
## 6 2 3
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    2    4    6

4. Now let’s practice calling custom functions within a for loops. Use the code from previous lectures on loops and functions to complete the following steps:

I had some trouble with this question!

a. Simulate a dataset with 3 groups of data, each group drawn from a distribution with a different mean. The final data frame should have 1 column for group and 1 column for the response variable.

n <- 15 # number of observations in each variable
var <- c("var1", "var2", "var3") # including three variable groups into the var object

data <- data.frame(matrix(nrow= n, ncol=2)) # make dataframe that will have two columns and 15 rows per variable
var_df <- NULL # empty object to assign temporary dataframe to

for(i in 1:length(var)) {
  data[1] <- rep(var[i], each=n) # repeating var (starting with var1) for first column
  data[2] <- rnorm(n=n, mean=runif(1)) # generating 15 random numbers for second column of dataframe, each with different mean for each var
  var_df <- rbind(var_df,data) # binding data in one dataframe
}

str(var_df)
## 'data.frame':    45 obs. of  2 variables:
##  $ X1: chr  "var1" "var1" "var1" "var1" ...
##  $ X2: num  -0.236 -0.351 1.835 0.724 -1.476 ...

b. Write a custom function that 1) reshuffles the response variable, and 2) calculates the mean of each group in the reshuffled data. Store the means in a vector of length 3.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
reshufflefunc <- function(n=15, v=3) {
  var_df[2] <- sample(var_df$X2, replace = FALSE) # reshuffling column data
  group_mean<- aggregate(x= var_df$X2, by = list(var_df$X1), FUN = mean)
  
  mean_vec <- group_mean$x # converting to vector
print(mean_vec)
    
}

reshufflefunc()
## [1] -0.09219274  0.32947752  0.49237169

c.Use a for loop to repeat the function in b 100 times. Store the results in a data frame that has 1 column indicating the replicate number and 1 column for each new group mean, for a total of 4 columns.

rep <- 1:100
df_rep <- NULL
mean_vec <- NULL

for (i in 1:length(rep)) {
  mean_vec <- reshufflefunc()
  var_df[3] <- mean_vec
  var_df[4] <- i
  df_rep <- rbind(df_rep, var_df)
}
## [1] 0.1116536 0.2875698 0.3304331
## [1]  0.51986138 -0.05956103  0.26935611
## [1] -0.08166348  0.33544294  0.47587701
## [1] 0.331359641 0.391328036 0.006968787
## [1] 0.05127103 0.58578364 0.09260180
## [1] -0.2392034  0.8119848  0.1568750
## [1] 0.04223153 0.02886266 0.65856227
## [1]  0.69200780  0.13399867 -0.09635001
## [1] -0.01536468 -0.01570111  0.76072226
## [1] 0.03857038 0.21018156 0.48090453
## [1] 0.1695067 0.1363974 0.4237524
## [1]  0.14176221  0.63065919 -0.04276494
## [1]  0.5525815 -0.0408387  0.2179137
## [1]  0.3032310 -0.3189771  0.7454026
## [1]  0.4315640  0.4573164 -0.1592239
## [1] 0.61354321 0.04110590 0.07500736
## [1]  0.52038556  0.28912459 -0.07985368
## [1]  0.3774203 -0.1161106  0.4683467
## [1] 0.23881929 0.40755869 0.08327848
## [1]  0.60292706 -0.06068727  0.18741667
## [1] 0.002152842 0.398638763 0.328864859
## [1]  0.24787519  0.54818274 -0.06640147
## [1] 0.03728991 0.39212396 0.30024259
## [1] -0.1229843  0.2285686  0.6240722
## [1]  0.4896989  0.4448241 -0.2048665
## [1] -0.06512083  0.49207108  0.30270622
## [1] -0.05388114  0.44097837  0.34255923
## [1] 0.08504345 0.59980939 0.04480363
## [1] 0.13838398 0.05286335 0.53840913
## [1] 0.2869930 0.1694793 0.2731842
## [1] -0.03249127  0.19392969  0.56821804
## [1] 0.2623161 0.2272536 0.2400868
## [1]  0.76003934 -0.06109176  0.03070889
## [1] 0.3129130 0.3075294 0.1092141
## [1] 0.1237497 0.5000446 0.1058622
## [1] 0.2171381 0.2673493 0.2451691
## [1] -0.2625279  0.7726371  0.2195472
## [1] 0.3353519 0.1981925 0.1961121
## [1] 0.1514938 0.3241195 0.2540431
## [1] 0.04053719 0.52044404 0.16867524
## [1] 0.1878915 0.2436732 0.2980917
## [1] 0.534771184 0.002338733 0.192546547
## [1] 0.30573138 0.37698498 0.04694011
## [1] 0.31120682 0.02658819 0.39186145
## [1] 0.47764847 0.08258478 0.16942322
## [1] 0.1138517 0.4212171 0.1945877
## [1] 0.04255127 0.14693125 0.54017395
## [1]  0.52176348 -0.04121947  0.24911246
## [1] 0.2564847 0.1679310 0.3052407
## [1] 0.31033482 0.07880996 0.34051168
## [1] 0.2418881 0.2389388 0.2488295
## [1] 0.3327691 0.2355976 0.1612898
## [1] 0.48577283 0.06458547 0.17929816
## [1] 0.19308194 0.08857018 0.44800434
## [1] -0.1567055  0.4819412  0.4044208
## [1]  0.5588901 -0.1478622  0.3186286
## [1] 0.49890177 0.07183382 0.15892087
## [1] 0.2801625 0.2330944 0.2163996
## [1] 0.4013151 0.1674772 0.1608642
## [1]  0.06841912  0.66264390 -0.00140655
## [1]  0.2297345 -0.1314019  0.6313238
## [1] 0.1229879 0.2163282 0.3903404
## [1]  0.54083475 -0.04195755  0.23077926
## [1] 0.34786641 0.34771624 0.03407381
## [1] 0.2477494 0.2109797 0.2709274
## [1] -0.02585697  0.31674046  0.43877296
## [1]  0.3737565 -0.2351482  0.5910481
## [1] 0.2644985 0.1125039 0.3526541
## [1]  0.3271736  0.6469217 -0.2444389
## [1]  0.35585633 -0.03773454  0.41153468
## [1]  0.6633506  0.2977035 -0.2313976
## [1] 0.4447216 0.0652575 0.2196773
## [1] 0.2916927 0.2786432 0.1593205
## [1] 0.1154614 0.5086658 0.1055292
## [1] -0.07744774  0.40360343  0.40350078
## [1] 0.2577412 0.2425240 0.2293912
## [1]  0.5766907 -0.1771602  0.3301260
## [1] -0.1520647  0.5665322  0.3151890
## [1] -0.1678920  0.2977986  0.5997499
## [1] -0.03044279  0.37382446  0.38627480
## [1]  0.2437618  0.5905522 -0.1046576
## [1] -0.07394614  0.38724174  0.41636086
## [1]  0.1207899 -0.1886246  0.7974912
## [1] 0.04356014 0.44239731 0.24369901
## [1]  0.6365407  0.2723282 -0.1792124
## [1]  0.68304495  0.10645278 -0.05984127
## [1]  0.6328134  0.3392686 -0.2424256
## [1] 0.11331097 0.57188532 0.04446018
## [1] 0.51560255 0.07798684 0.13606707
## [1] 0.60540631 0.02437835 0.09987180
## [1] 0.1351198 0.2157810 0.3787557
## [1] 0.04453678 0.54907986 0.13603982
## [1] 0.1306433 0.2148154 0.3841977
## [1] 0.44932116 0.02854184 0.25179346
## [1]  0.27655174  0.52313193 -0.07002721
## [1] 0.04538114 0.45347380 0.23080152
## [1] 0.2399294 0.1468821 0.3428450
## [1] 0.25715110 0.06282994 0.40967543
## [1]  0.43921825  0.32611731 -0.03567909
## [1] 0.32426502 0.34859732 0.05679412
str(df_rep)
## 'data.frame':    4500 obs. of  4 variables:
##  $ X1: chr  "var1" "var1" "var1" "var1" ...
##  $ X2: num  -0.236 -0.351 1.835 0.724 -1.476 ...
##  $ V3: num  0.112 0.288 0.33 0.112 0.288 ...
##  $ V4: int  1 1 1 1 1 1 1 1 1 1 ...

d. Use qplot() to create a histogram of the means for each reshuffled group. Or, if you want a challenge, use ggplot() to overlay all 3 histograms in the same figure. How do the distributions of reshuffled means compare to the original means?

This doesnt seem right…

library(ggplot2)


plot <- ggplot(data = df_rep)+
  aes(x=V3, fill=X1)+
  geom_histogram()
plot
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.