Methods and generics in R allow us to express a generic idea (such as plot, print, summary) and dispatch an associated function, relative to it’s class. For example
x <- matrix(rnorm(200), ncol = 2)
y <- lm(Sepal.Length ~ Sepal.Width + Petal.Length, data = iris)
plot(x)
plot(y, which = 1)
We can see the methods availabe to us with the methods
call
class(x); class(y)
## [1] "matrix"
## [1] "lm"
plot
## function (x, y, ...)
## UseMethod("plot")
## <bytecode: 0x55a135d33358>
## <environment: namespace:graphics>
methods(plot)
## [1] plot.acf* plot.data.frame* plot.decomposed.ts*
## [4] plot.default plot.dendrogram* plot.density*
## [7] plot.ecdf plot.factor* plot.formula*
## [10] plot.function plot.hclust* plot.histogram*
## [13] plot.HoltWinters* plot.isoreg* plot.lm*
## [16] plot.medpolish* plot.mlm* plot.ppr*
## [19] plot.prcomp* plot.princomp* plot.profile.nls*
## [22] plot.raster* plot.spec* plot.stepfun
## [25] plot.stl* plot.table* plot.ts
## [28] plot.tskernel* plot.TukeyHSD*
## see '?methods' for accessing help and source code
?plot.default
?plot.lm
This is implemented in a pretty straightforward manner, and are written of the form generic.class()
(which also emphasizes why we should use underscores instead of periods for function names)
## The generic, only calls to method
donothing <- function(x) {
UseMethod("donothing")
}
## In case a particular method isn't found, use default
donothing.default <- function(x) {
print("this is default, you should usually include this")
}
## Class specific methods
donothing.matrix <- function(x) {
print("this is a matrix")
}
donothing.data.frame <- function(x) {
print("maybe consider using data.table instead")
}
donothing.newclass <- function(x) {
print("helpful if you want to extend previous methods with your own class")
}
z <- list(1:10)
donothing(z)
## [1] "this is default, you should usually include this"
x <- matrix(rnorm(10))
donothing(x)
## [1] "this is a matrix"
y <- as.data.frame(x)
donothing(y)
## [1] "maybe consider using data.table instead"
## Say w 'inherits' from y
w <- structure(y, class = c("newclass", class(y)))
donothing(w)
## [1] "helpful if you want to extend previous methods with your own class"
There may be a few things that you wish to do each time you work with R. One of the options available to us is a configuration file known as an .Rprofile. A user can have multiple .Rprofile files, often there is a system wide file, one in a user’s home directory, as well as one in a project directory (which takes precedence if R is loaded from said directory). Generally none of these are generated by default, and if they are, it’s usually the system wide version. Some things to keep in mind
We will take a look at an example .Rprofile in RStudio, but for those coming back to this document, a few nice things to include (many taken from linked R-Bloggers page)
## Source all R files in a directory
sourceDir <- function(path, trace = FALSE, ...) {
for(nm in list.files(path, pattern = "\\.[Rr]")) {
if(trace) cat(nm, ":")
source(file.path(path, nm), ...)
if(trace) cat("\n")
}
}
## This is bad. Why?
options(stringsAsFactors=FALSE)
## This is OK, scientific notation is hard
options(scipen=50)
## Format printing output
options(width=80, digits = 5)
## Especially nice when quitting from command line
q <- function (save="no", ...) {
quit(save=save, ...)
}
## Not sure why this isn't default, allows autocomplete of library names
utils::rc.settings(ipck=TRUE)
## I don't typically use these, but they are an excellent idea
## This timestamps (with directory) all of the interactive R code you run
.First <- function(){
if(interactive()){
library(utils)
timestamp(,prefix=paste("##------ [",getwd(),"] ",sep=""))
}
}
## Then when exiting, it saves your (timestamped) interactive
## history into either a system designated file (R_HISTFILE) or
## into a hidden .Rhistory file in your home directory
.Last <- function(){
if(interactive()){
hist_file <- Sys.getenv("R_HISTFILE")
if(hist_file=="") hist_file <- "~/.RHistory"
savehistory(hist_file)
}
}
# ## Make new environment for startup functions (won't be erased with (rm()))
# .startup <- new.env()
# .startup$somefunction <- function() do something
# attach(.startup)
# sys.source("startfuns.R", envir = attach(NULL, name = ".startup"))
There are a few places that can unsuspectingly cause difficulty. This comes from the patchwork nature of R, and as such, it’s good to be aware of some of the places that things can go wrong. First and absolutely foremost (unrelated to being careful, but absolutely essential practice), avoid Magic Numbers at all cost. This means you should never write something like
x <- 1:10
## No
for(i in 1:10) {
x[i] <- x[i]^2
}
What if you have this written 10,000 times in your code and the length of x
changes? Far better to be explicit in what you want, i.e.,
x <- 1:10
## Better, I guess
for(i in 1:length(x)) {
x[i] <- x[i]^2
}
Even better, when you are working with vector operations, skip the loop altogether. This holds for basically all pointwise vector operations. R is clever enough to know what you want when you use vectorized arithmetic in the presence of scalars as well (using recycling, as we will see below)
x <- rnorm(10)
y <- rnorm(10)
z <- 1
x*y + z*y - exp(x) + x/y
## [1] -2.05282 -1.06310 0.49620 -0.88176 0.27282 -0.73947 1.17943 0.28762
## [9] -0.50503 2.20186
# ## !!!
# x*x != t(x) %*% x
Now, despite what was said above for illustrative purposes, we can do one better to anticipate unanticipated changes in our code. First, let’s take a moment to familiarize ourselves with a length zero numeric vector. It is a numeric vector of zero length (duh), but this all means that operations on it are maybe not what you expect
x <- numeric(0L)
x
## numeric(0)
x + 10
## numeric(0)
identical(x, 0L)
## [1] FALSE
This can be a particularly sneaky error in our loops
x <- numeric(0L)
## Incorrect
for(i in 1:length(x)) {
print(i)
}
## [1] 1
## [1] 0
Instead, the function seq_along
gives us the expected output
## Correct
for(i in seq_along(x)) {
print(i)
}
Ok, one more place where I lied about what to do. We saw above that x <- numeric(0L)
returns a vector of zero length. Does this generalize to other positive integers?
## Neat!
x <- numeric(5)
x
## [1] 0 0 0 0 0
length(x)
## [1] 5
## Less neat
x <- list(5)
x
## [[1]]
## [1] 5
length(x)
## [1] 1
We can remedy this again by being more explicit about what we want
x <- vector("numeric", length = 5L)
y <- vector("list", length = 5L)
x
## [1] 0 0 0 0 0
y
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## NULL
##
## [[5]]
## NULL
We do still see a difference here, unfortunately, and that is in what R chooses to initialize each of the vectors with. I think it is good practice to initialize values with something like NA
to differentiate between a true zero and an element that was never assigned a value. For example, suppose we want to perform a collection of computations that have a non-zero possibility of failing, and in which 0
is also a plausible solution
## Initialized with 0
y <- vector("numeric", length = 1000L)
set.seed(69)
for(i in seq_along(y)) {
u <- runif(1)
if(u < 0.8) {
y[i] <- rbinom(1, 1, 0.75)
}
}
table(y, useNA = "always")
## y
## 0 1 <NA>
## 403 597 0
## Initialized with NA
y <- NA*vector("numeric", length = 1000L)
set.seed(69)
for(i in seq_along(y)) {
u <- runif(1)
if(u < 0.8) {
y[i] <- rbinom(1, 1, 0.75)
}
}
table(y, useNA = "always")
## y
## 0 1 <NA>
## 242 597 161
One of the simplest fixes to slow code is to look for places where we are making copies in memory. This happens any time we use c()
, rbind()
, or cbind()
, for example. Better to initialize a vector (or list) of the appropriate size, and then fill in as necessary. This allows R to preallocate the memory it will need rather than making copies as the need arises
library(rbenchmark)
grow_vec <- function(n) {
x <- c()
for(i in seq_len(n)) {
x <- cbind(x, rnorm(1))
}
}
init_vec <- function(n) {
x <- vector("numeric", n)
for(i in seq_len(n)) {
x[i] <- rnorm(1)
}
}
benchmark(
grow_vec(n = 10000),
init_vec(n = 10000),
replications = 25
)
## test replications elapsed relative user.self sys.self
## 1 grow_vec(n = 10000) 25 2.669 6.574 2.661 0.008
## 2 init_vec(n = 10000) 25 0.406 1.000 0.397 0.008
## user.child sys.child
## 1 0 0
## 2 0 0
We’ve seen a few examples so far, and most all of them have used dreaded for
loops, carrying with them accusations of inefficiency and being slow. However, let’s consider the following ways in which we may complicate finding the column means of a matrix
set.seed(69)
x <- matrix(rnorm(100000*100), ncol = 100)
## Using apply
f_apply <- function(x) {
res <- apply(x, 2, mean)
}
## Loops
f_loop <- function(x) {
res <- vector("numeric", ncol(x))
for(i in seq_along(ncol(x))) {
res[i] <- mean(x[, i])
}
}
f_loop2 <- function(x) {
res <- vector("numeric", ncol(x))
n <- nrow(x)
for(i in seq_along(ncol(x))) {
res[i] <- sum(x[, i])/n
}
}
## And lapply
f_lapply <- function(x) {
res <- lapply(as.data.frame(x), mean, numeric(1))
unlist(res, use.names = FALSE)
}
## And colMeans
benchmark(f_apply(x),
f_loop(x),
f_loop2(x),
f_lapply(x),
colMeans(x),
replications = 50)
## test replications elapsed relative user.self sys.self user.child
## 5 colMeans(x) 50 0.472 23.6 0.472 0.000 0
## 1 f_apply(x) 50 7.816 390.8 6.261 1.555 0
## 4 f_lapply(x) 50 2.698 134.9 2.598 0.100 0
## 2 f_loop(x) 50 0.024 1.2 0.016 0.008 0
## 3 f_loop2(x) 50 0.020 1.0 0.012 0.008 0
## sys.child
## 5 0
## 1 0
## 4 0
## 2 0
## 3 0
What’s perhaps most surprising here is the roughly 18 times speed increase that our for
loop has over the built-in colMeans
function, much less the 300 times increase associated with apply
. Why might this be? We can get some insight if we take a peek at the source code for each, where we see that the R implementation of these functions carries with them a lot of additional overhead outside of simplying appying column means. This is further evidenced by the fact that the loop utilizing sum/length
is slightly faster than simply calling the mean
funciton (though mean
will be more numerically accurate in any instance in which the values are not identical). And if we consider apply
, we can see that it’s actually an even nastier, more complicated loop that’s just been hidden from us
apply
## function (X, MARGIN, FUN, ...)
## {
## FUN <- match.fun(FUN)
## dl <- length(dim(X))
## if (!dl)
## stop("dim(X) must have a positive length")
## if (is.object(X))
## X <- if (dl == 2L)
## as.matrix(X)
## else as.array(X)
## d <- dim(X)
## dn <- dimnames(X)
## ds <- seq_len(dl)
## if (is.character(MARGIN)) {
## if (is.null(dnn <- names(dn)))
## stop("'X' must have named dimnames")
## MARGIN <- match(MARGIN, dnn)
## if (anyNA(MARGIN))
## stop("not all elements of 'MARGIN' are names of dimensions")
## }
## s.call <- ds[-MARGIN]
## s.ans <- ds[MARGIN]
## d.call <- d[-MARGIN]
## d.ans <- d[MARGIN]
## dn.call <- dn[-MARGIN]
## dn.ans <- dn[MARGIN]
## d2 <- prod(d.ans)
## if (d2 == 0L) {
## newX <- array(vector(typeof(X), 1L), dim = c(prod(d.call),
## 1L))
## ans <- forceAndCall(1, FUN, if (length(d.call) < 2L) newX[,
## 1] else array(newX[, 1L], d.call, dn.call), ...)
## return(if (is.null(ans)) ans else if (length(d.ans) <
## 2L) ans[1L][-1L] else array(ans, d.ans, dn.ans))
## }
## newX <- aperm(X, c(s.call, s.ans))
## dim(newX) <- c(prod(d.call), d2)
## ans <- vector("list", d2)
## if (length(d.call) < 2L) {
## if (length(dn.call))
## dimnames(newX) <- c(dn.call, list(NULL))
## for (i in 1L:d2) {
## tmp <- forceAndCall(1, FUN, newX[, i], ...)
## if (!is.null(tmp))
## ans[[i]] <- tmp
## }
## }
## else for (i in 1L:d2) {
## tmp <- forceAndCall(1, FUN, array(newX[, i], d.call,
## dn.call), ...)
## if (!is.null(tmp))
## ans[[i]] <- tmp
## }
## ans.list <- is.recursive(ans[[1L]])
## l.ans <- length(ans[[1L]])
## ans.names <- names(ans[[1L]])
## if (!ans.list)
## ans.list <- any(lengths(ans) != l.ans)
## if (!ans.list && length(ans.names)) {
## all.same <- vapply(ans, function(x) identical(names(x),
## ans.names), NA)
## if (!all(all.same))
## ans.names <- NULL
## }
## len.a <- if (ans.list)
## d2
## else length(ans <- unlist(ans, recursive = FALSE))
## if (length(MARGIN) == 1L && len.a == d2) {
## names(ans) <- if (length(dn.ans[[1L]]))
## dn.ans[[1L]]
## ans
## }
## else if (len.a == d2)
## array(ans, d.ans, dn.ans)
## else if (len.a && len.a%%d2 == 0L) {
## if (is.null(dn.ans))
## dn.ans <- vector(mode = "list", length(d.ans))
## dn1 <- list(ans.names)
## if (length(dn.call) && !is.null(n1 <- names(dn <- dn.call[1])) &&
## nzchar(n1) && length(ans.names) == length(dn[[1]]))
## names(dn1) <- n1
## dn.ans <- c(dn1, dn.ans)
## array(ans, c(len.a%/%d2, d.ans), if (!is.null(names(dn.ans)) ||
## !all(vapply(dn.ans, is.null, NA)))
## dn.ans)
## }
## else ans
## }
## <bytecode: 0x55a1373d8558>
## <environment: namespace:base>
lapply
is only slightly better here, though it’s still a for
loop, just written in C. We can compare even further by considering the vapply
function, which is similar to lapply
, but with the additional constraint that one must prespecify the output. Naturally, we might suspect that having prespecified the output would lead to a substantial performance gain:
x <- rpois(n = 500000, lambda = 20)
benchmark(
lapply(x, function(y) sqrt(y) < 5),
vapply(x, function(y) (sqrt(y) < 5), logical(1)),
replications = 25
)
## test replications elapsed
## 1 lapply(x, function(y) sqrt(y) < 5) 25 5.602
## 2 vapply(x, function(y) (sqrt(y) < 5), logical(1)) 25 5.819
## relative user.self sys.self user.child sys.child
## 1 1.000 5.599 0.004 0 0
## 2 1.039 5.819 0.000 0 0
Substantial indeed. However, what we are failing to account for here is two things
vapply
is going to return a numeric vector, rather than a list (which needs to be unlisted)vapply
gives us type-checking, throwing an error when our function fails to return the appropriate outputEach of these are non-trivial benefits of vapply
, and we verify the saved overhead by giving a more appropriate comparison
## Unlist AND do type-checking
f_lapply2 <- function(x) {
res <- lapply(x, function(y) {
tt <- (sqrt(y) > 5)
## Type checking
if(!is.logical(tt) | length(tt) != 1) stop("stop")
})
## Unlisting
unlist(res, use.names = FALSE)
}
benchmark(
vapply(x, function(y) sqrt(y) > 5, logical(1)),
f_lapply2(x),
replications = 25
)
## test replications elapsed relative
## 2 f_lapply2(x) 25 10.730 1.907
## 1 vapply(x, function(y) sqrt(y) > 5, logical(1)) 25 5.627 1.000
## user.self sys.self user.child sys.child
## 2 10.717 0.007 0 0
## 1 5.627 0.000 0 0
We see that functionals do still have their place, both by performing safety checks and formatting output, and by making code more expressive and easier to read. The main take away here is that the performance cost is minor in day to day work, but if you do make use of the apply
functionals, especially in computationally heavy simulation, it may be worth investigating what can be saved with an explicit for
loop (with appropriate type checking, of course).
Hinted at above, if we do run lapply
(or any other function that returns a list), and we wish to unlist it, we can see significant performance gains by passing an additional argument to unlist
x <- matrix(rnorm(1e6*2), nrow = 2)
res <- lapply(as.data.frame(x), mean)
## We often don't need the names of the listed elements
benchmark(
unlist(res),
unlist(res, use.names = FALSE)
)
## test replications elapsed relative user.self
## 2 unlist(res, use.names = FALSE) 100 5.203 1.000 5.096
## 1 unlist(res) 100 14.396 2.767 14.346
## sys.self user.child sys.child
## 2 0.104 0 0
## 1 0.048 0 0
One common use of lapply
is the generation of simulated observations, reading in files, or some other set of operations resulting in a data.frame
. The rbindlist
function, from the data.table
package, gives us an excellent way to speed up the process of generating a single data.frame
from the list of data.frames
we start with. As an added bonus, you end up with a data.table
instead
library(data.table)
datasets <- lapply(1:1000, function(x) {
dat <- data.frame(ID = rep(x, 15),
A = rnorm(15),
B = rnorm(15),
C = rnorm(15))
})
f_loop <- function(x) {
res <- data.frame()
for(i in seq_along(x)) {
res <- rbind(res, x[[i]])
}
}
benchmark(
f_loop(datasets),
Reduce(rbind, datasets),
rbindlist(datasets),
replications = 10
)
## test replications elapsed relative user.self sys.self
## 1 f_loop(datasets) 10 4.540 349.23 4.540 0
## 3 rbindlist(datasets) 10 0.013 1.00 0.013 0
## 2 Reduce(rbind, datasets) 10 5.474 421.08 5.474 0
## user.child sys.child
## 1 0 0
## 3 0 0
## 2 0 0
# ## My typical pattern
# result <- lapply(datasets, function(x) {
# # do something
# }) %>% rbindlist()
Here, we introduce a couple of helpful wrapper functions. While these may or may not be useful to you in particular, they do illustrate ways to combine functions in R to create new tools that may be of use to you. One of the simplest I use is built around object.size
x <- matrix(rnorm(1e6*2), nrow = 2)
object.size(x)
## 16000216 bytes
?object.size
size_of <- function(x, units = "Mb") {
format(object.size(x), units = units)
}
size_of(x)
## [1] "15.3 Mb"
The next two are less trivial. First, we consider a function that filters out null values from a list
## Function to remove null values from list
compact <- function(x) {
Filter(Negate(is.null), x)
}
x <- vector("list", 5)
for(i in c(1, 3, 5)) {
x[[i]] <- i
}
x
## [[1]]
## [1] 1
##
## [[2]]
## NULL
##
## [[3]]
## [1] 3
##
## [[4]]
## NULL
##
## [[5]]
## [1] 5
(x <- compact(x))
## [[1]]
## [1] 1
##
## [[2]]
## [1] 3
##
## [[3]]
## [1] 5
Now suppose first that we are running an analysis in which we fit models on collections of simulated data, some of which are destined to fail
## Simulate model failure
dummy_fit <- function() {
u <- runif(1)
if(u < 0.8) {
res <- lm(Sepal.Width ~ Sepal.Length + Petal.Length, data = iris)
} else {
res <- NULL
}
return(res)
}
set.seed(69)
model_fits <- replicate(20, dummy_fit())
We can then use our compact function to remove those that failed while retaining ones that are ready for further work
length(model_fits)
## [1] 20
length(compact(model_fits))
## [1] 16
Finally, we consider the where
function, a painfully absent function in R, though with a close cousin, which
. which
, given a vector of TRUE/FALSE
values, returns indicies of those values which are TRUE
. where
, on the other hand, is given a vector and a logical expression, and returns a logical vector. These names probably ought to be flipped, however, which
is in base R and was named first, so here we are. Note one benefit below is that which
can take an expression (returning a vector of logicals) whereas where
must take a function as an argument
## First which
x <- 1:10
idx <- which(x < 5)
x[idx]
## [1] 1 2 3 4
## Then where
where <- function(x, f, only.true = FALSE) {
res <- vapply(x, f, logical(1))
if(only.true) return(names(res[res == TRUE]))
res
}
## Subsetting by logicals equivalent to subsetting by position
idx <- where(x, function(y) (y < 5))
x[idx]
## [1] 1 2 3 4
x[which(idx)]
## [1] 1 2 3 4
## Where is the iris dataset numeric?
(vars <- where(iris, is.numeric))
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## TRUE TRUE TRUE TRUE FALSE
## Maybe we only want where it's true
(num_vars <- where(iris, is.numeric, only.true = TRUE))
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
Perhaps the default option of only.true
makes more sense to be set to TRUE
. Or maybe you think only.true
is a stupid name for an argument and you want it to change it to something else. Good news! You can, it’s your function now, do with it what you will.
One benefit to keeping only true names is to perform class specific operations
## Only vars that are numeric
(num_vars <- where(iris, is.numeric, only.true = TRUE))
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
mean_val <- vector("numeric", length(num_vars))
names(mean_val) <- num_vars
for(var in num_vars) {
mean_val[var] <- mean(iris[[var]])
}
Here are some bonus tips that show off a bit more of what to expect with the vectorization, as implemented in R
library(data.table)
## Vector recycle
x <- 1:20
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Recycle 2, get evens
x[c(FALSE, TRUE)]
## [1] 2 4 6 8 10 12 14 16 18 20
## Recycle 3, get multiples of 3
x[c(FALSE, FALSE, TRUE)]
## [1] 3 6 9 12 15 18
## Make column names
(col_names <- paste0("col", seq_len(10)))
## [1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"
## [10] "col10"
## Vectorization
x <- seq_len(10)
y <- NA*x
z <- NA*x
## Not vectorized :(
if(x < 5) {
y <- x
} else {
y <- 5
}
## Warning in if (x < 5) {: the condition has length > 1 and only the first element
## will be used
print(y)
## [1] 1 2 3 4 5 6 7 8 9 10
## Vectorized!
ifelse(x < 5, z <- x, z <- 5)
## [1] 1 2 3 4 5 5 5 5 5 5
x <- c(TRUE, TRUE, FALSE)
y <- c(TRUE, FALSE, FALSE)
## Only inspects first element
x && y
## [1] TRUE
x || y
## [1] TRUE
## Inspects all elements
x & y
## [1] TRUE FALSE FALSE
x | y
## [1] TRUE TRUE FALSE
## Sure would be nice if this worked...
# USArrests["Murder" > 13.2, ]
usa <- as.data.table(USArrests, keep.rownames = TRUE)
head(usa)
## rn Murder Assault UrbanPop Rape
## 1: Alabama 13.2 236 58 21.2
## 2: Alaska 10.0 263 48 44.5
## 3: Arizona 8.1 294 80 31.0
## 4: Arkansas 8.8 190 50 19.5
## 5: California 9.0 276 91 40.6
## 6: Colorado 7.9 204 78 38.7
## Here we are subsetting by logical vectors
usa[Murder > 13.1 & Assault > 230, ]
## rn Murder Assault UrbanPop Rape
## 1: Alabama 13.2 236 58 21.2
## 2: Florida 15.4 335 80 31.9
## 3: Louisiana 15.4 249 66 22.2
## 4: Mississippi 16.1 259 44 17.1
## 5: South Carolina 14.4 279 48 22.5
## And again, but with a different logical vector
usa[Murder > 13.1 && Assault > 230, ]
## rn Murder Assault UrbanPop Rape
## 1: Alabama 13.2 236 58 21.2
## 2: Alaska 10.0 263 48 44.5
## 3: Arizona 8.1 294 80 31.0
## 4: Arkansas 8.8 190 50 19.5
## 5: California 9.0 276 91 40.6
## 6: Colorado 7.9 204 78 38.7
## 7: Connecticut 3.3 110 77 11.1
## 8: Delaware 5.9 238 72 15.8
## 9: Florida 15.4 335 80 31.9
## 10: Georgia 17.4 211 60 25.8
## 11: Hawaii 5.3 46 83 20.2
## 12: Idaho 2.6 120 54 14.2
## 13: Illinois 10.4 249 83 24.0
## 14: Indiana 7.2 113 65 21.0
## 15: Iowa 2.2 56 57 11.3
## 16: Kansas 6.0 115 66 18.0
## 17: Kentucky 9.7 109 52 16.3
## 18: Louisiana 15.4 249 66 22.2
## 19: Maine 2.1 83 51 7.8
## 20: Maryland 11.3 300 67 27.8
## 21: Massachusetts 4.4 149 85 16.3
## 22: Michigan 12.1 255 74 35.1
## 23: Minnesota 2.7 72 66 14.9
## 24: Mississippi 16.1 259 44 17.1
## 25: Missouri 9.0 178 70 28.2
## 26: Montana 6.0 109 53 16.4
## 27: Nebraska 4.3 102 62 16.5
## 28: Nevada 12.2 252 81 46.0
## 29: New Hampshire 2.1 57 56 9.5
## 30: New Jersey 7.4 159 89 18.8
## 31: New Mexico 11.4 285 70 32.1
## 32: New York 11.1 254 86 26.1
## 33: North Carolina 13.0 337 45 16.1
## 34: North Dakota 0.8 45 44 7.3
## 35: Ohio 7.3 120 75 21.4
## 36: Oklahoma 6.6 151 68 20.0
## 37: Oregon 4.9 159 67 29.3
## 38: Pennsylvania 6.3 106 72 14.9
## 39: Rhode Island 3.4 174 87 8.3
## 40: South Carolina 14.4 279 48 22.5
## 41: South Dakota 3.8 86 45 12.8
## 42: Tennessee 13.2 188 59 26.9
## 43: Texas 12.7 201 80 25.5
## 44: Utah 3.2 120 80 22.9
## 45: Vermont 2.2 48 32 11.2
## 46: Virginia 8.5 156 63 20.7
## 47: Washington 4.0 145 73 26.2
## 48: West Virginia 5.7 81 39 9.3
## 49: Wisconsin 2.6 53 66 10.8
## 50: Wyoming 6.8 161 60 15.6
## rn Murder Assault UrbanPop Rape
## What do those vectors look like?
attach(usa) # <- adds var names of usa to global environment
## Returns appropriate length
Murder > 13.1 & Assault > 230
## [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE
## Recycles length one vector, selecting all columns
Murder > 13.1 && Assault > 230
## [1] TRUE
Basically everything from this document came from the ideas or work of others. In particular, excellent references are Advanced R by Hadley Wickham and R Inferno by Patrick Burns. Other sources include R documentation, the R Internals Manual, and Stack Exchange.