Functions, in their simplest definitions, can be thought of as pre-packaged snippets of code that assist in performing common or repetitive tasks. While many functions are provided by R or R packages (including nearly everything we have done), there are often times where we wish to construct our own. The goal of this lab is to introduce the components of functinos in R as well as illustrate how to create them ourselves.
Functions in R are similar to functions in other programming language, with the more important differences being beyond the scope of this class. In this portion of the lab, we will familiarize ourselves with how to create functions and use them in our own work.
Functions in R primarily consist of three components:
Functions in R begin with an assignment with <- to a
name, along with the function function() and any arguments
the function will take (e.g., x and y),
followed by the body of the function enclosed in curly brackets
{}. Here, for example, is a function that computes the sum
of squares of two inputs, x and y
sum_of_squares <- function(x, y) {
x^2 + y^2
}
sum_of_squares(x = 2, y = 3)
## [1] 13
We can also write functions that take default arguments. For
example, in the sum_of_squares() function, if we do not
provide both an x and y, we will get an
error:
sum_of_squares(x = 2)
## Error in sum_of_squares(x = 2): argument "y" is missing, with no default
We can rewrite our function so that we have a default argument with
y = 3. Note that the default only applies when an
argument for y is not given; if we do supply an argument to
y, the default with be ignored.
sum_of_squares_default <- function(x, y = 3) {
x^2 + y^2
}
# Using the default argument
sum_of_squares_default(x = 2)
## [1] 13
# Providing our own
sum_of_squares_default(x = 2, y = 10)
## [1] 104
A helpful tip: we can see the code used inside of a function by simply typing the function name into the console without adding parentheses.
## See the code for sum_of_squares
sum_of_squares
## function (x, y)
## {
## x^2 + y^2
## }
## <bytecode: 0x58361606a9f0>
Question 1: For this question, we will use two
separate datasets. You may also consider the R functions
?table() and ?sort()
police <- read.csv("https://remiller1450.github.io/data/Police2019.csv")
college <- read.csv("https://collinn.github.io/data/college2019.csv")
Your goal is to write a function called top_table that
takes a character vector and returns the names of the values with the
top five occurrences. Then, verify that it works by printing out the top
five states in both the police and college datasets. Your results should
look like this:
top_table(police$state)
## v
## CA TX FL AZ CO
## 825 496 369 259 204
top_table(college$State)
## v
## PA NY CA TX OH
## 85 67 63 60 48
Question 2: Modify top_table so that it
takes an additional argument n that allows you to specify
that you want to view the top N values in each vector. Here, for
example, we print the top 10:
top_table(police$state, n = 10)
## v
## CA TX FL AZ CO GA OK NC OH WA
## 825 496 369 259 204 189 170 163 159 156
top_table(college$State, n = 10)
## v
## PA NY CA TX OH IL NC MA MI IN
## 85 67 63 60 48 45 40 36 34 33
Question 3: Write a function called
long_square that takes a single numeric argument
n. If the length of the vector is greater than the square
root of the sum of all the numbers in the vector, the function should
print "long!". Otherwise, is should print
"not long!". Verify that it works for the arguments
x1, x2, and x3.
x1 <- c(1, 2, 3, 4, 5)
x2 <- c(5, 8, 10, 12)
x3 <- c(2, 5, 9, 10, 1, 1, 1)
long_square(n = x1)
## [1] "long!"
long_square(n = x2)
## [1] "not long!"
long_square(n = x3)
## [1] "long!"
Question 4: Here, we are going to modify the
top_table function we wrote in Question 1 once more. In
addition to having a second argument n indicating the
number of results, we now want to include a third argument
top which takes either TRUE or
FALSE. When TRUE, the function should return
the top n rows; when FALSE, it should return
the bottom n.
top_table(police$state, n = 5, top = TRUE)
## v
## CA TX FL AZ CO
## 825 496 369 259 204
top_table(college$State, n = 10, top = FALSE)
## v
## AK WY NV DE NM AZ DC ID NH RI
## 1 1 2 3 4 5 5 5 5 5