Introduction

This lab will cover conditionals, loops and file systems, two and a half separate and unrelated items

Conditionals

Conditionals are basic structures in R that operate very similarly to the ifelse() function. Rather than specify return values, they represent actions based on actions. We construct conditionals with three “constructs” (they’re not really functions): if(), else if(), and else. They are best described with a simple example:

d <- 5

if (d > 7) {
  print("yay")
} else if (d < 4) {
  print("boo")
} else {
  print("hiss")
}

## [1] "hiss"

The only one of these constructs actually needed is if() which will simply check a condition and evaluate it’s subsequent expression if true

d <- -5

if (d < 0) {
  d <- abs(d)
}

d

## [1] 5

Using else, we can specify a course of action if the if() isn’t satisfied. else doesn’t require a condition and will always evaluate if the if() is FALSE (%% is modular arithmetic)

d <- 5

if (d %% 2 == 0) {
  print("d is even")
} else {
  print("d is odd")
}

## [1] "d is odd"

else if () sits in between these two, serving as a second condition to check if the first if() fails. Note that it does not require having an else as a final condition

d <- 11

if (d %% 2 == 0) {
  print("d is even")
} else if (d %% 5 == 0) {
  print("d is not even but it is divisible by 5")
}

No exercises here, but we will use conditionals in the next section.

Loops

The concept of a loop is straightforward enough: given a vector of values, we will loop through each one, performing some operation. To construct a loop, you’ll need three parts:

An indexing term, or a variable that will change at each iteration of the loop
A vector of values we wish to loop through (usually something like 1:n)
An expression that evaluates at each iteration of the loop

In the example below, i is the iterating variable (it changes at each iteration) and 1:10 is the set of values we will loop through. The expression follows between sets of curly braces

library(stringr)
for (i in 1:10) {
  print(str_c("The variable 'i' is now equal to: ", i, collapse = ""))
}

## [1] "The variable 'i' is now equal to: 1"
## [1] "The variable 'i' is now equal to: 2"
## [1] "The variable 'i' is now equal to: 3"
## [1] "The variable 'i' is now equal to: 4"
## [1] "The variable 'i' is now equal to: 5"
## [1] "The variable 'i' is now equal to: 6"
## [1] "The variable 'i' is now equal to: 7"
## [1] "The variable 'i' is now equal to: 8"
## [1] "The variable 'i' is now equal to: 9"
## [1] "The variable 'i' is now equal to: 10"

(Note: if you wish to print the output from inside of a loop, you need to include print())

More commonly, i is used to index elements of a vector, and instead of a set value like 1:10, we often request a sequence the length of some vector

x <- c(2, 4, 6, 8, 10)

## This is much better than 1:length(x)
seq_along(x)

## [1] 1 2 3 4 5

for (i in seq_along(x)) {
  print(x[i]^2)
}

## [1] 4
## [1] 16
## [1] 36
## [1] 64
## [1] 100

Here, we use the [] notation to index our vector x. That is, each iteration of the loop will access x[1], x[2], etc.,

Loops are primarily used when the sequence of operations performed are more complicated or if the element we are indexing is not a vector. In contrast, simple operations on vectors can usually be done without a loop

## This does the same thing as above just much much faster
x^2

## [1]   4  16  36  64 100

Loops are more useful, for example, if we wish to loop along the columns of a data frame

df <- data.frame(x = 1:5, 
                 y = 6:10, 
                 z = 11:15)
df

##   x  y  z
## 1 1  6 11
## 2 2  7 12
## 3 3  8 13
## 4 4  9 14
## 5 5 10 15

# mean of each column
for (i in seq_len(ncol(df))) {
  print(mean(df[, i]))
}

## [1] 3
## [1] 8
## [1] 13

Finally, if we wish to save the output of a loop, we can do so by creating a vector that is the same length as the vector we are iterating over and assigning the results to each position. We can create an empty numeric vector of length n with numeric(length = n):

df <- data.frame(x = 1:5, 
                 y = 6:10, 
                 z = 11:15)

# Creates an empty numeric vector
results <- numeric(length = ncol(df))

for (i in 1:ncol(df)) {
  # ith value of results is mean of ith column of df
  results[i] <- mean(df[, i])
}

results

## [1]  3  8 13

Extra loop constructs

Two other constructs to know about in loops are next and break; next will end the current iteration and move to the next one, whereas break will end the looping process entirely. See how this is used to prevent us from indexing out of bounds when finding differences between elements of a vector

x <- c(2, 5, 3, 9, 10, 13)

## Construct a vector to store results
diff <- numeric(length = length(x))

for (i in seq_along(x)) {
  
  ## If we are at first element, do something different
  if (i == 1) {
    diff[i] <- 0
    next
  }
  
  diff[i] <- x[i] - x[i-1]
}

diff

## [1]  0  3 -2  6  1  3

Question 1: What is the difference between seq_len() and seq_along(). Look at both examples above and provide an explanation when each might be used

Question 2: Using the vectors x and y, create a new vector z and use a loop so that each element of z is the sum of elements of x and y (that is, z[1] <- x[1] + y[1])

x <- c(2, 4, 6, 8, 10)
y <- c(1, 3, 5, 7, 9)

Question 3: The Fibonacci sequence is a famous sequence in each each element is the sum of the two numbers preceding it. The first few elements of the sequence are 1, 1, 2, 3, 5, 8, 13… Use a loop to construct the first 15 elements of the Fibonacci sequence.

File Systems

Although much of the work we have done so far has involved reading data into our R session and manipulating in the environment here, we often are working with our computer in more intimate ways. While R provides very powerful tools for manipulating the underlying file system, we limit our attention today to just working directly with files that we have locally on our machine.

To begin, we can identify the directory that we are currently working in (the directory in which you Rmd file is saved) with the function getwd() (“get working directory”)

getwd()

## [1] "/home/collin/courses/sta230/s26/labs"

We can see a list of sub-directories with list.dirs() and move to any one of them with the setwd() (set working directory) function

## Ignore this, R Markdown resets directory after each code chunk
setwd("fundir/")

## Recursively lists all the directories
list.dirs()

## [1] "."     "./hwk" "./sim"

## Move into the directory called sim
setwd("sim")

## Now we are in fundir/sim
getwd()

## [1] "/home/collin/courses/sta230/s26/labs/fundir/sim"

Once we are in a directory, we can list all of the files in a directory with list.files(). Conveniently, list.files() has a pattern argument allowing us to specify a regular expression to indicate which files we want to list.

## Ignore this, R Markdown resets directory after each code chunk
setwd("fundir/sim")

# List all of the files in fundir/sim
list.files()

##  [1] "sim_1.csv"  "sim_10.csv" "sim_2.csv"  "sim_3.csv"  "sim_4.csv" 
##  [6] "sim_5.csv"  "sim_6.csv"  "sim_7.csv"  "sim_8.csv"  "sim_9.csv" 
## [11] "sim.R"

# List all of the files in fundr/sim with .csv extension
list.files(pattern = "csv")

##  [1] "sim_1.csv"  "sim_10.csv" "sim_2.csv"  "sim_3.csv"  "sim_4.csv" 
##  [6] "sim_5.csv"  "sim_6.csv"  "sim_7.csv"  "sim_8.csv"  "sim_9.csv"

list.files() returns a character vector, the contents of which can be used as arguments to other functions. For example, we have used read.csv() several times in this course, and now we can do it passing variables are arguments

## R Markdown resets directory after each code chunk, so if my Rmd file is in 
## a different place, I must reset it after each chunk
setwd("fundir/sim")

## Get all files with "csv" in name
my_sims <- list.files(pattern = "csv")

## See all the files we found
my_sims

##  [1] "sim_1.csv"  "sim_10.csv" "sim_2.csv"  "sim_3.csv"  "sim_4.csv" 
##  [6] "sim_5.csv"  "sim_6.csv"  "sim_7.csv"  "sim_8.csv"  "sim_9.csv"

## Read the first simulation data in using [] notation
sim1 <- read.csv(my_sims[1])

## I now have a data frame with 10 observations
sim1

##          V1
## 1  -1.36856
## 2   0.46344
## 3   1.87452
## 4  -0.25650
## 5  -1.41324
## 6   0.83684
## 7  -0.15509
## 8   0.15707
## 9   0.19777
## 10 -0.43873

We can combine this with other things we have learned in this lab in order to create a vector with the mean value from each of our simulations:

## Ignore this, R Markdown resets directory after each code chunk
setwd("fundir/sim")

# Create a numeric vector the same length as `my_sims` vector
mean_vec <- numeric(length = length(my_sims))

for (i in seq_along(mean_vec)) {
  
  # First read in the ith simulation
  dat <- read.csv(my_sims[i])
  
  # Compute the mean and save it to the ith spot here
  mean_vec[i] <- mean(dat$V1)
}

## Mean values from each simulation
mean_vec

##  [1] -0.010247  0.509078 -0.508938 -0.145795 -0.498174 -0.345951  0.036010
##  [8] -0.066167  0.328893  0.097149

We are left with a vector that is the same length as the number of simulations. Each element of this vector corresponds to a simulation mean

An alternative

In the above example, we changed our working directory with setwd() in order to read the file in directly; however, we can also do this by specifying a path when reading in files. Consider this alternative for listing the available simulations:

(sims <- list.files(path = "fundir/sim", pattern = ".csv"))

##  [1] "sim_1.csv"  "sim_10.csv" "sim_2.csv"  "sim_3.csv"  "sim_4.csv" 
##  [6] "sim_5.csv"  "sim_6.csv"  "sim_7.csv"  "sim_8.csv"  "sim_9.csv"

This will list the files, but when I try to pass this to read.csv(), the file isn’t found.

read.csv(sims[1])

## Warning in file(file, "rt"): cannot open file 'sim_1.csv': No such file or
## directory

## Error in file(file, "rt"): cannot open the connection

We aren’t able to read this because the simulation files don’t exist within our working directory. We can remedy this by requesting that list.files() include the relative path from our working directory to the csv files

## sims now contains the path to the csv files
(sims <- list.files(path = "fundir/sim", pattern = "csv", full.names = TRUE))

##  [1] "fundir/sim/sim_1.csv"  "fundir/sim/sim_10.csv" "fundir/sim/sim_2.csv" 
##  [4] "fundir/sim/sim_3.csv"  "fundir/sim/sim_4.csv"  "fundir/sim/sim_5.csv" 
##  [7] "fundir/sim/sim_6.csv"  "fundir/sim/sim_7.csv"  "fundir/sim/sim_8.csv" 
## [10] "fundir/sim/sim_9.csv"

## We can then read it in from our current working directory without changing anything
read.csv(sims[1])

##          V1
## 1  -1.36856
## 2   0.46344
## 3   1.87452
## 4  -0.25650
## 5  -1.41324
## 6   0.83684
## 7  -0.15509
## 8   0.15707
## 9   0.19777
## 10 -0.43873

If for any reason we wanted to strip the path and retain just the file name, we can do so with basename()

## Path to first csv
sims[1]

## [1] "fundir/sim/sim_1.csv"

## Basename of first csv
basename(sims[1])

## [1] "sim_1.csv"

\(~\)

Question 4: For this lab, download these files and unzip them in the same directory as your lab Rmd file. Contained within this zipped folder are the results of an experiment that we conducted on 60 people in three different experimental groups. The data files themselves only contain the results from the experiment; information on the subjects themselves is stored as metadata in the file name. For example, the file PatriciaFexpA.csv contains the results for the subject named Patricia, sex Female, in experimental group A. Similarly, RobertMexpC.csv contains data for Robert, sex Male, in experimental group C. Each file contains 5 observations – however, we are only interested in retaining the mean value of these observations for each subject.

Use this information along with the data in the zipped folder to recreate the plot below.

Hint: Just like numerics, character vectors can be created with character(length = n)