Introduction

Functions, in their simplest definitions, can be thought of as pre-packaged snippets of code that assist in performing common or repetitive tasks. While many functions are provided by R or R packages (including nearly everything we have done), there are often times where we wish to construct our own. The goal of this lab is to introduce the components of functinos in R as well as illustrate how to create them ourselves.

Functions

Functions in R are similar to functions in other programming language, with the more important differences being beyond the scope of this class. In this portion of the lab, we will familiarize ourselves with how to create functions and use them in our own work.

Functions in R primarily consist of three components:

  1. The name of the function
  2. The arguments of the function
  3. The body of the function

Functions in R begin with an assignment with <- to a name, along with the function function() and any arguments the function will take (e.g., x and y), followed by the body of the function enclosed in curly brackets {}. Here, for example, is a function that computes the sum of squares of two inputs, x and y

sum_of_squares <- function(x, y) {
  x^2 + y^2
}

sum_of_squares(x = 2, y = 3)
## [1] 13

We can also write functions that take default arguments. For example, in the sum_of_squares() function, if we do not provide both an x and y, we will get an error:

sum_of_squares(x = 2)
## Error in sum_of_squares(x = 2): argument "y" is missing, with no default

We can rewrite our function so that we have a default argument with y = 3. Note that the default only applies when an argument for y is not given; if we do supply an argument to y, the default with be ignored.

sum_of_squares_default <- function(x, y = 3) {
  x^2 + y^2
}

# Using the default argument
sum_of_squares_default(x = 2)
## [1] 13
# Providing our own
sum_of_squares_default(x = 2, y = 10)
## [1] 104

A helpful tip: we can see the code used inside of a function by simply typing the function name into the console without adding parentheses.

## See the code for sum_of_squares
sum_of_squares
## function (x, y) 
## {
##     x^2 + y^2
## }
## <bytecode: 0x58361606a9f0>

Question 1: For this question, we will use two separate datasets. You may also consider the R functions ?table() and ?sort()

police <- read.csv("https://remiller1450.github.io/data/Police2019.csv")
college <- read.csv("https://collinn.github.io/data/college2019.csv")

Your goal is to write a function called top_table that takes a character vector and returns the names of the values with the top five occurrences. Then, verify that it works by printing out the top five states in both the police and college datasets. Your results should look like this:

top_table(police$state)
## v
##  CA  TX  FL  AZ  CO 
## 825 496 369 259 204
top_table(college$State)
## v
## PA NY CA TX OH 
## 85 67 63 60 48

Question 2: Modify top_table so that it takes an additional argument n that allows you to specify that you want to view the top N values in each vector. Here, for example, we print the top 10:

top_table(police$state, n = 10)
## v
##  CA  TX  FL  AZ  CO  GA  OK  NC  OH  WA 
## 825 496 369 259 204 189 170 163 159 156
top_table(college$State, n = 10)
## v
## PA NY CA TX OH IL NC MA MI IN 
## 85 67 63 60 48 45 40 36 34 33

Question 3: Write a function called long_square that takes a single numeric argument n. If the length of the vector is greater than the square root of the sum of all the numbers in the vector, the function should print "long!". Otherwise, is should print "not long!". Verify that it works for the arguments x1, x2, and x3.

x1 <- c(1, 2, 3, 4, 5)
x2 <- c(5, 8, 10, 12)
x3 <- c(2, 5, 9, 10, 1, 1, 1)

long_square(n = x1)
## [1] "long!"
long_square(n = x2)
## [1] "not long!"
long_square(n = x3)
## [1] "long!"

Question 4: Here, we are going to modify the top_table function we wrote in Question 1 once more. In addition to having a second argument n indicating the number of results, we now want to include a third argument top which takes either TRUE or FALSE. When TRUE, the function should return the top n rows; when FALSE, it should return the bottom n.

top_table(police$state, n = 5, top = TRUE)
## v
##  CA  TX  FL  AZ  CO 
## 825 496 369 259 204
top_table(college$State, n = 10, top = FALSE)
## v
## AK WY NV DE NM AZ DC ID NH RI 
##  1  1  2  3  4  5  5  5  5  5

Lists? Functionals?