This lab will cover conditionals, loops and file systems, two and a half separate and unrelated items
Conditionals are basic structures in R that operate very similarly to
the ifelse() function. Rather than specify return values,
they represent actions based on actions. We construct conditionals with
three “constructs” (they’re not really functions): if(),
else if(), and else. They are best described
with a simple example:
d <- 5
if (d > 7) {
print("yay")
} else if (d < 4) {
print("boo")
} else {
print("hiss")
}
## [1] "hiss"
The only one of these constructs actually needed is if()
which will simply check a condition and evaluate it’s subsequent
expression if true
d <- -5
if (d < 0) {
d <- abs(d)
}
d
## [1] 5
Using else, we can specify a course of action if the
if() isn’t satisfied. else doesn’t require a
condition and will always evaluate if the if() is FALSE (%%
is modular arithmetic)
d <- 5
if (d %% 2 == 0) {
print("d is even")
} else {
print("d is odd")
}
## [1] "d is odd"
else if () sits in between these two, serving as a
second condition to check if the first if() fails. Note
that it does not require having an else as a final
condition
d <- 11
if (d %% 2 == 0) {
print("d is even")
} else if (d %% 5 == 0) {
print("d is not even but it is divisible by 5")
}
No exercises here, but we will use conditionals in the next section.
The concept of a loop is straightforward enough: given a vector of values, we will loop through each one, performing some operation. To construct a loop, you’ll need three parts:
In the example below, i is the iterating variable (it
changes at each iteration) and 1:10 is the set of values we will loop
through. The expression follows between sets of curly braces
library(stringr)
for (i in 1:10) {
print(str_c("The variable 'i' is now equal to: ", i, collapse = ""))
}
## [1] "The variable 'i' is now equal to: 1"
## [1] "The variable 'i' is now equal to: 2"
## [1] "The variable 'i' is now equal to: 3"
## [1] "The variable 'i' is now equal to: 4"
## [1] "The variable 'i' is now equal to: 5"
## [1] "The variable 'i' is now equal to: 6"
## [1] "The variable 'i' is now equal to: 7"
## [1] "The variable 'i' is now equal to: 8"
## [1] "The variable 'i' is now equal to: 9"
## [1] "The variable 'i' is now equal to: 10"
(Note: if you wish to print the output from inside of a loop, you
need to include print())
More commonly, i is used to index elements of a vector,
and instead of a set value like 1:10, we often request a
sequence the length of some vector
x <- c(2, 4, 6, 8, 10)
## This is much better than 1:length(x)
seq_along(x)
## [1] 1 2 3 4 5
for (i in seq_along(x)) {
print(x[i]^2)
}
## [1] 4
## [1] 16
## [1] 36
## [1] 64
## [1] 100
Here, we use the [] notation to index our vector
x. That is, each iteration of the loop will access
x[1], x[2], etc.,
Loops are primarily used when the sequence of operations performed are more complicated or if the element we are indexing is not a vector. In contrast, simple operations on vectors can usually be done without a loop
## This does the same thing as above just much much faster
x^2
## [1] 4 16 36 64 100
Loops are more useful, for example, if we wish to loop along the columns of a data frame
df <- data.frame(x = 1:5,
y = 6:10,
z = 11:15)
df
## x y z
## 1 1 6 11
## 2 2 7 12
## 3 3 8 13
## 4 4 9 14
## 5 5 10 15
# mean of each column
for (i in seq_len(ncol(df))) {
print(mean(df[, i]))
}
## [1] 3
## [1] 8
## [1] 13
Finally, if we wish to save the output of a loop, we can do so by
creating a vector that is the same length as the vector we are iterating
over and assigning the results to each position. We can create an empty
numeric vector of length n with
numeric(length = n):
df <- data.frame(x = 1:5,
y = 6:10,
z = 11:15)
# Creates an empty numeric vector
results <- numeric(length = ncol(df))
for (i in 1:ncol(df)) {
# ith value of results is mean of ith column of df
results[i] <- mean(df[, i])
}
results
## [1] 3 8 13
Two other constructs to know about in loops are next and
break; next will end the current iteration and
move to the next one, whereas break will end the looping
process entirely. See how this is used to prevent us from indexing out
of bounds when finding differences between elements of a vector
x <- c(2, 5, 3, 9, 10, 13)
## Construct a vector to store results
diff <- numeric(length = length(x))
for (i in seq_along(x)) {
## If we are at first element, do something different
if (i == 1) {
diff[i] <- 0
next
}
diff[i] <- x[i] - x[i-1]
}
diff
## [1] 0 3 -2 6 1 3
Question 1: What is the difference between
seq_len() and seq_along(). Look at both
examples above and provide an explanation when each might be used
Question 2: Using the vectors x and
y, create a new vector z and use a loop so
that each element of z is the sum of elements of
x and y (that is,
z[1] <- x[1] + y[1])
x <- c(2, 4, 6, 8, 10)
y <- c(1, 3, 5, 7, 9)
Question 3: The Fibonacci sequence is a famous sequence in each each element is the sum of the two numbers preceding it. The first few elements of the sequence are 1, 1, 2, 3, 5, 8, 13… Use a loop to construct the first 15 elements of the Fibonacci sequence.
Although much of the work we have done so far has involved reading data into our R session and manipulating in the environment here, we often are working with our computer in more intimate ways. While R provides very powerful tools for manipulating the underlying file system, we limit our attention today to just working directly with files that we have locally on our machine.
To begin, we can identify the directory that we are currently working
in (the directory in which you Rmd file is saved) with the function
getwd() (“get working directory”)
getwd()
## [1] "/home/collin/courses/sta230/s26/labs"
We can see a list of sub-directories with list.dirs()
and move to any one of them with the setwd() (set working
directory) function
## Ignore this, R Markdown resets directory after each code chunk
setwd("fundir/")
## Recursively lists all the directories
list.dirs()
## [1] "." "./hwk" "./sim"
## Move into the directory called sim
setwd("sim")
## Now we are in fundir/sim
getwd()
## [1] "/home/collin/courses/sta230/s26/labs/fundir/sim"
Once we are in a directory, we can list all of the files in a
directory with list.files(). Conveniently,
list.files() has a pattern argument allowing us to specify
a regular expression to indicate which files we want to list.
## Ignore this, R Markdown resets directory after each code chunk
setwd("fundir/sim")
# List all of the files in fundir/sim
list.files()
## [1] "sim_1.csv" "sim_10.csv" "sim_2.csv" "sim_3.csv" "sim_4.csv"
## [6] "sim_5.csv" "sim_6.csv" "sim_7.csv" "sim_8.csv" "sim_9.csv"
## [11] "sim.R"
# List all of the files in fundr/sim with .csv extension
list.files(pattern = "csv")
## [1] "sim_1.csv" "sim_10.csv" "sim_2.csv" "sim_3.csv" "sim_4.csv"
## [6] "sim_5.csv" "sim_6.csv" "sim_7.csv" "sim_8.csv" "sim_9.csv"
list.files() returns a character vector, the contents of
which can be used as arguments to other functions. For example, we have
used read.csv() several times in this course, and now we
can do it passing variables are arguments
## R Markdown resets directory after each code chunk, so if my Rmd file is in
## a different place, I must reset it after each chunk
setwd("fundir/sim")
## Get all files with "csv" in name
my_sims <- list.files(pattern = "csv")
## See all the files we found
my_sims
## [1] "sim_1.csv" "sim_10.csv" "sim_2.csv" "sim_3.csv" "sim_4.csv"
## [6] "sim_5.csv" "sim_6.csv" "sim_7.csv" "sim_8.csv" "sim_9.csv"
## Read the first simulation data in using [] notation
sim1 <- read.csv(my_sims[1])
## I now have a data frame with 10 observations
sim1
## V1
## 1 -1.36856
## 2 0.46344
## 3 1.87452
## 4 -0.25650
## 5 -1.41324
## 6 0.83684
## 7 -0.15509
## 8 0.15707
## 9 0.19777
## 10 -0.43873
We can combine this with other things we have learned in this lab in order to create a vector with the mean value from each of our simulations:
## Ignore this, R Markdown resets directory after each code chunk
setwd("fundir/sim")
# Create a numeric vector the same length as `my_sims` vector
mean_vec <- numeric(length = length(my_sims))
for (i in seq_along(mean_vec)) {
# First read in the ith simulation
dat <- read.csv(my_sims[i])
# Compute the mean and save it to the ith spot here
mean_vec[i] <- mean(dat$V1)
}
## Mean values from each simulation
mean_vec
## [1] -0.010247 0.509078 -0.508938 -0.145795 -0.498174 -0.345951 0.036010
## [8] -0.066167 0.328893 0.097149
We are left with a vector that is the same length as the number of simulations. Each element of this vector corresponds to a simulation mean
In the above example, we changed our working directory with
setwd() in order to read the file in directly; however, we
can also do this by specifying a path when reading in files. Consider
this alternative for listing the available simulations:
(sims <- list.files(path = "fundir/sim", pattern = ".csv"))
## [1] "sim_1.csv" "sim_10.csv" "sim_2.csv" "sim_3.csv" "sim_4.csv"
## [6] "sim_5.csv" "sim_6.csv" "sim_7.csv" "sim_8.csv" "sim_9.csv"
This will list the files, but when I try to pass this to
read.csv(), the file isn’t found.
read.csv(sims[1])
## Warning in file(file, "rt"): cannot open file 'sim_1.csv': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection
We aren’t able to read this because the simulation files don’t exist
within our working directory. We can remedy this by requesting that
list.files() include the relative path from our
working directory to the csv files
## sims now contains the path to the csv files
(sims <- list.files(path = "fundir/sim", pattern = "csv", full.names = TRUE))
## [1] "fundir/sim/sim_1.csv" "fundir/sim/sim_10.csv" "fundir/sim/sim_2.csv"
## [4] "fundir/sim/sim_3.csv" "fundir/sim/sim_4.csv" "fundir/sim/sim_5.csv"
## [7] "fundir/sim/sim_6.csv" "fundir/sim/sim_7.csv" "fundir/sim/sim_8.csv"
## [10] "fundir/sim/sim_9.csv"
## We can then read it in from our current working directory without changing anything
read.csv(sims[1])
## V1
## 1 -1.36856
## 2 0.46344
## 3 1.87452
## 4 -0.25650
## 5 -1.41324
## 6 0.83684
## 7 -0.15509
## 8 0.15707
## 9 0.19777
## 10 -0.43873
If for any reason we wanted to strip the path and retain just the
file name, we can do so with basename()
## Path to first csv
sims[1]
## [1] "fundir/sim/sim_1.csv"
## Basename of first csv
basename(sims[1])
## [1] "sim_1.csv"
\(~\)
Question 4: For this lab, download these
files and unzip them in the same directory as your lab Rmd file.
Contained within this zipped folder are the results of an experiment
that we conducted on 60 people in three different experimental groups.
The data files themselves only contain the results from the experiment;
information on the subjects themselves is stored as metadata in
the file name. For example, the file PatriciaFexpA.csv
contains the results for the subject named Patricia, sex Female, in
experimental group A. Similarly, RobertMexpC.csv contains
data for Robert, sex Male, in experimental group C. Each file contains 5
observations – however, we are only interested in retaining the mean
value of these observations for each subject.
Use this information along with the data in the zipped folder to recreate the plot below.
Hint: Just like numerics, character vectors can be created with
character(length = n)