Tips and Tricks

Methods

Methods and generics in R allow us to express a generic idea (such as plot, print, summary) and dispatch an associated function, relative to it’s class. For example

x <- matrix(rnorm(200), ncol = 2)
y <- lm(Sepal.Length ~ Sepal.Width + Petal.Length, data = iris)

plot(x)

plot(y, which = 1)

We can see the methods availabe to us with the methods call

class(x); class(y)

## [1] "matrix"

## [1] "lm"

plot

## function (x, y, ...) 
## UseMethod("plot")
## <bytecode: 0x55a135d33358>
## <environment: namespace:graphics>

methods(plot)

##  [1] plot.acf*           plot.data.frame*    plot.decomposed.ts*
##  [4] plot.default        plot.dendrogram*    plot.density*      
##  [7] plot.ecdf           plot.factor*        plot.formula*      
## [10] plot.function       plot.hclust*        plot.histogram*    
## [13] plot.HoltWinters*   plot.isoreg*        plot.lm*           
## [16] plot.medpolish*     plot.mlm*           plot.ppr*          
## [19] plot.prcomp*        plot.princomp*      plot.profile.nls*  
## [22] plot.raster*        plot.spec*          plot.stepfun       
## [25] plot.stl*           plot.table*         plot.ts            
## [28] plot.tskernel*      plot.TukeyHSD*     
## see '?methods' for accessing help and source code

?plot.default
?plot.lm

This is implemented in a pretty straightforward manner, and are written of the form generic.class() (which also emphasizes why we should use underscores instead of periods for function names)

## The generic, only calls to method
donothing <- function(x) {
  UseMethod("donothing")
}

## In case a particular method isn't found, use default
donothing.default <- function(x) {
  print("this is default, you should usually include this")
}

## Class specific methods
donothing.matrix <- function(x) {
  print("this is a matrix")
}

donothing.data.frame <- function(x) {
  print("maybe consider using data.table instead")
}

donothing.newclass <- function(x) {
  print("helpful if you want to extend previous methods with your own class")
}

z <- list(1:10)
donothing(z)

## [1] "this is default, you should usually include this"

x <- matrix(rnorm(10))
donothing(x)

## [1] "this is a matrix"

y <- as.data.frame(x)
donothing(y)

## [1] "maybe consider using data.table instead"

## Say w 'inherits' from y
w <- structure(y, class = c("newclass", class(y)))
donothing(w)

## [1] "helpful if you want to extend previous methods with your own class"

.Rprofile and startup

There may be a few things that you wish to do each time you work with R. One of the options available to us is a configuration file known as an .Rprofile. A user can have multiple .Rprofile files, often there is a system wide file, one in a user’s home directory, as well as one in a project directory (which takes precedence if R is loaded from said directory). Generally none of these are generated by default, and if they are, it’s usually the system wide version. Some things to keep in mind

Ok to load packages automatically if using interactively
Helpful to source functions used in analysis
Ok to set some system options, but BE CAREFUL
Creating a startup environment allows you to load functions and not worry if you clear the environment with (rm(list = ls()))

We will take a look at an example .Rprofile in RStudio, but for those coming back to this document, a few nice things to include (many taken from linked R-Bloggers page)

## Source all R files in a directory
sourceDir <- function(path, trace = FALSE, ...) {
  for(nm in list.files(path, pattern = "\\.[Rr]")) {
    if(trace) cat(nm, ":")
    source(file.path(path, nm), ...)
    if(trace) cat("\n")
  }
}

## This is bad. Why?
options(stringsAsFactors=FALSE) 

## This is OK, scientific notation is hard
options(scipen=50)

## Format printing output
options(width=80, digits = 5)

## Especially nice when quitting from command line
q <- function (save="no", ...) {
  quit(save=save, ...)
}
 
## Not sure why this isn't default, allows autocomplete of library names
utils::rc.settings(ipck=TRUE)

## I don't typically use these, but they are an excellent idea
## This timestamps (with directory) all of the interactive R code you run
.First <- function(){
  if(interactive()){
    library(utils)
    timestamp(,prefix=paste("##------ [",getwd(),"] ",sep=""))
  }
}

## Then when exiting, it saves your (timestamped) interactive
## history into either a system designated file (R_HISTFILE) or
## into a hidden .Rhistory file in your home directory
.Last <- function(){
  if(interactive()){
    hist_file <- Sys.getenv("R_HISTFILE")
    if(hist_file=="") hist_file <- "~/.RHistory"
    savehistory(hist_file)
  }
}

# ## Make new environment for startup functions (won't be erased with (rm()))
# .startup <- new.env()
# .startup$somefunction <- function() do something
# attach(.startup)
# sys.source("startfuns.R", envir = attach(NULL, name = ".startup"))

Be Careful!

There are a few places that can unsuspectingly cause difficulty. This comes from the patchwork nature of R, and as such, it’s good to be aware of some of the places that things can go wrong. First and absolutely foremost (unrelated to being careful, but absolutely essential practice), avoid Magic Numbers at all cost. This means you should never write something like

x <- 1:10

## No
for(i in 1:10) {
  x[i] <- x[i]^2
}

What if you have this written 10,000 times in your code and the length of x changes? Far better to be explicit in what you want, i.e.,

x <- 1:10

## Better, I guess
for(i in 1:length(x)) {
  x[i] <- x[i]^2
}

Even better, when you are working with vector operations, skip the loop altogether. This holds for basically all pointwise vector operations. R is clever enough to know what you want when you use vectorized arithmetic in the presence of scalars as well (using recycling, as we will see below)

x <- rnorm(10)
y <- rnorm(10)
z <- 1

x*y + z*y - exp(x) + x/y

##  [1] -2.05282 -1.06310  0.49620 -0.88176  0.27282 -0.73947  1.17943  0.28762
##  [9] -0.50503  2.20186

# ## !!!
# x*x != t(x) %*% x

Now, despite what was said above for illustrative purposes, we can do one better to anticipate unanticipated changes in our code. First, let’s take a moment to familiarize ourselves with a length zero numeric vector. It is a numeric vector of zero length (duh), but this all means that operations on it are maybe not what you expect

x <- numeric(0L)
x

## numeric(0)

x + 10

## numeric(0)

identical(x, 0L)

## [1] FALSE

This can be a particularly sneaky error in our loops

x <- numeric(0L)

## Incorrect
for(i in 1:length(x)) {
  print(i)
}

## [1] 1
## [1] 0

Instead, the function seq_along gives us the expected output

## Correct
for(i in seq_along(x)) {
  print(i)
}

Ok, one more place where I lied about what to do. We saw above that x <- numeric(0L) returns a vector of zero length. Does this generalize to other positive integers?

## Neat!
x <- numeric(5)
x

## [1] 0 0 0 0 0

length(x)

## [1] 5

## Less neat
x <- list(5)
x

## [[1]]
## [1] 5

length(x)

## [1] 1

We can remedy this again by being more explicit about what we want

x <- vector("numeric", length = 5L)
y <- vector("list", length = 5L)
x

## [1] 0 0 0 0 0

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL

We do still see a difference here, unfortunately, and that is in what R chooses to initialize each of the vectors with. I think it is good practice to initialize values with something like NA to differentiate between a true zero and an element that was never assigned a value. For example, suppose we want to perform a collection of computations that have a non-zero possibility of failing, and in which 0 is also a plausible solution

## Initialized with 0
y <- vector("numeric", length = 1000L)

set.seed(69)
for(i in seq_along(y)) {
  u <- runif(1)
  if(u < 0.8) {
    y[i] <- rbinom(1, 1, 0.75)
  }
}

table(y, useNA = "always")

## y
##    0    1 <NA> 
##  403  597    0

## Initialized with NA
y <- NA*vector("numeric", length = 1000L)

set.seed(69)
for(i in seq_along(y)) {
  u <- runif(1)
  if(u < 0.8) {
    y[i] <- rbinom(1, 1, 0.75)
  }
}

table(y, useNA = "always")

## y
##    0    1 <NA> 
##  242  597  161

Don’t be unnecessarily slow

One of the simplest fixes to slow code is to look for places where we are making copies in memory. This happens any time we use c(), rbind(), or cbind(), for example. Better to initialize a vector (or list) of the appropriate size, and then fill in as necessary. This allows R to preallocate the memory it will need rather than making copies as the need arises

library(rbenchmark)

grow_vec <- function(n) {
  x <- c()
  for(i in seq_len(n)) {
    x <- cbind(x, rnorm(1))
  }
}

init_vec <- function(n) {
  x <- vector("numeric", n)
  for(i in seq_len(n)) {
    x[i] <- rnorm(1)
  }
}


benchmark(
  grow_vec(n = 10000),
  init_vec(n = 10000),
  replications = 25
)

##                  test replications elapsed relative user.self sys.self
## 1 grow_vec(n = 10000)           25   2.669    6.574     2.661    0.008
## 2 init_vec(n = 10000)           25   0.406    1.000     0.397    0.008
##   user.child sys.child
## 1          0         0
## 2          0         0

We’ve seen a few examples so far, and most all of them have used dreaded for loops, carrying with them accusations of inefficiency and being slow. However, let’s consider the following ways in which we may complicate finding the column means of a matrix

set.seed(69)
x <- matrix(rnorm(100000*100), ncol = 100)

## Using apply
f_apply <- function(x) {
  res <- apply(x, 2, mean)
}

## Loops
f_loop <- function(x) {
  res <- vector("numeric", ncol(x))
  for(i in seq_along(ncol(x))) {
    res[i] <- mean(x[, i])
  }
}

f_loop2 <- function(x) {
  res <- vector("numeric", ncol(x))
  n <- nrow(x)
  for(i in seq_along(ncol(x))) {
    res[i] <- sum(x[, i])/n
  }
}

## And lapply
f_lapply <- function(x) {
  res <- lapply(as.data.frame(x), mean, numeric(1))
  unlist(res, use.names = FALSE)
}

## And colMeans
benchmark(f_apply(x), 
          f_loop(x),
          f_loop2(x),
          f_lapply(x),
          colMeans(x),
          replications = 50)

##          test replications elapsed relative user.self sys.self user.child
## 5 colMeans(x)           50   0.472     23.6     0.472    0.000          0
## 1  f_apply(x)           50   7.816    390.8     6.261    1.555          0
## 4 f_lapply(x)           50   2.698    134.9     2.598    0.100          0
## 2   f_loop(x)           50   0.024      1.2     0.016    0.008          0
## 3  f_loop2(x)           50   0.020      1.0     0.012    0.008          0
##   sys.child
## 5         0
## 1         0
## 4         0
## 2         0
## 3         0

What’s perhaps most surprising here is the roughly 18 times speed increase that our for loop has over the built-in colMeans function, much less the 300 times increase associated with apply. Why might this be? We can get some insight if we take a peek at the source code for each, where we see that the R implementation of these functions carries with them a lot of additional overhead outside of simplying appying column means. This is further evidenced by the fact that the loop utilizing sum/length is slightly faster than simply calling the mean funciton (though mean will be more numerically accurate in any instance in which the values are not identical). And if we consider apply, we can see that it’s actually an even nastier, more complicated loop that’s just been hidden from us

apply

## function (X, MARGIN, FUN, ...) 
## {
##     FUN <- match.fun(FUN)
##     dl <- length(dim(X))
##     if (!dl) 
##         stop("dim(X) must have a positive length")
##     if (is.object(X)) 
##         X <- if (dl == 2L) 
##             as.matrix(X)
##         else as.array(X)
##     d <- dim(X)
##     dn <- dimnames(X)
##     ds <- seq_len(dl)
##     if (is.character(MARGIN)) {
##         if (is.null(dnn <- names(dn))) 
##             stop("'X' must have named dimnames")
##         MARGIN <- match(MARGIN, dnn)
##         if (anyNA(MARGIN)) 
##             stop("not all elements of 'MARGIN' are names of dimensions")
##     }
##     s.call <- ds[-MARGIN]
##     s.ans <- ds[MARGIN]
##     d.call <- d[-MARGIN]
##     d.ans <- d[MARGIN]
##     dn.call <- dn[-MARGIN]
##     dn.ans <- dn[MARGIN]
##     d2 <- prod(d.ans)
##     if (d2 == 0L) {
##         newX <- array(vector(typeof(X), 1L), dim = c(prod(d.call), 
##             1L))
##         ans <- forceAndCall(1, FUN, if (length(d.call) < 2L) newX[, 
##             1] else array(newX[, 1L], d.call, dn.call), ...)
##         return(if (is.null(ans)) ans else if (length(d.ans) < 
##             2L) ans[1L][-1L] else array(ans, d.ans, dn.ans))
##     }
##     newX <- aperm(X, c(s.call, s.ans))
##     dim(newX) <- c(prod(d.call), d2)
##     ans <- vector("list", d2)
##     if (length(d.call) < 2L) {
##         if (length(dn.call)) 
##             dimnames(newX) <- c(dn.call, list(NULL))
##         for (i in 1L:d2) {
##             tmp <- forceAndCall(1, FUN, newX[, i], ...)
##             if (!is.null(tmp)) 
##                 ans[[i]] <- tmp
##         }
##     }
##     else for (i in 1L:d2) {
##         tmp <- forceAndCall(1, FUN, array(newX[, i], d.call, 
##             dn.call), ...)
##         if (!is.null(tmp)) 
##             ans[[i]] <- tmp
##     }
##     ans.list <- is.recursive(ans[[1L]])
##     l.ans <- length(ans[[1L]])
##     ans.names <- names(ans[[1L]])
##     if (!ans.list) 
##         ans.list <- any(lengths(ans) != l.ans)
##     if (!ans.list && length(ans.names)) {
##         all.same <- vapply(ans, function(x) identical(names(x), 
##             ans.names), NA)
##         if (!all(all.same)) 
##             ans.names <- NULL
##     }
##     len.a <- if (ans.list) 
##         d2
##     else length(ans <- unlist(ans, recursive = FALSE))
##     if (length(MARGIN) == 1L && len.a == d2) {
##         names(ans) <- if (length(dn.ans[[1L]])) 
##             dn.ans[[1L]]
##         ans
##     }
##     else if (len.a == d2) 
##         array(ans, d.ans, dn.ans)
##     else if (len.a && len.a%%d2 == 0L) {
##         if (is.null(dn.ans)) 
##             dn.ans <- vector(mode = "list", length(d.ans))
##         dn1 <- list(ans.names)
##         if (length(dn.call) && !is.null(n1 <- names(dn <- dn.call[1])) && 
##             nzchar(n1) && length(ans.names) == length(dn[[1]])) 
##             names(dn1) <- n1
##         dn.ans <- c(dn1, dn.ans)
##         array(ans, c(len.a%/%d2, d.ans), if (!is.null(names(dn.ans)) || 
##             !all(vapply(dn.ans, is.null, NA))) 
##             dn.ans)
##     }
##     else ans
## }
## <bytecode: 0x55a1373d8558>
## <environment: namespace:base>

lapply is only slightly better here, though it’s still a for loop, just written in C. We can compare even further by considering the vapply function, which is similar to lapply, but with the additional constraint that one must prespecify the output. Naturally, we might suspect that having prespecified the output would lead to a substantial performance gain:

x <- rpois(n = 500000, lambda = 20)

benchmark(
  lapply(x, function(y) sqrt(y) < 5),
  vapply(x, function(y) (sqrt(y) < 5), logical(1)),
  replications = 25
)

##                                               test replications elapsed
## 1               lapply(x, function(y) sqrt(y) < 5)           25   5.602
## 2 vapply(x, function(y) (sqrt(y) < 5), logical(1))           25   5.819
##   relative user.self sys.self user.child sys.child
## 1    1.000     5.599    0.004          0         0
## 2    1.039     5.819    0.000          0         0

Substantial indeed. However, what we are failing to account for here is two things

vapply is going to return a numeric vector, rather than a list (which needs to be unlisted)
vapply gives us type-checking, throwing an error when our function fails to return the appropriate output

Each of these are non-trivial benefits of vapply, and we verify the saved overhead by giving a more appropriate comparison

## Unlist AND do type-checking
f_lapply2 <- function(x) {
  res <- lapply(x, function(y) {
    tt <- (sqrt(y) > 5)
    ## Type checking
    if(!is.logical(tt) | length(tt) != 1) stop("stop")
  })
  ## Unlisting
  unlist(res, use.names = FALSE)
}

benchmark(
  vapply(x, function(y) sqrt(y) > 5, logical(1)),
  f_lapply2(x),
  replications = 25
)

##                                             test replications elapsed relative
## 2                                   f_lapply2(x)           25  10.730    1.907
## 1 vapply(x, function(y) sqrt(y) > 5, logical(1))           25   5.627    1.000
##   user.self sys.self user.child sys.child
## 2    10.717    0.007          0         0
## 1     5.627    0.000          0         0

We see that functionals do still have their place, both by performing safety checks and formatting output, and by making code more expressive and easier to read. The main take away here is that the performance cost is minor in day to day work, but if you do make use of the apply functionals, especially in computationally heavy simulation, it may be worth investigating what can be saved with an explicit for loop (with appropriate type checking, of course).

Hinted at above, if we do run lapply (or any other function that returns a list), and we wish to unlist it, we can see significant performance gains by passing an additional argument to unlist

x <- matrix(rnorm(1e6*2), nrow = 2)
res <- lapply(as.data.frame(x), mean)

## We often don't need the names of the listed elements
benchmark(
unlist(res),
unlist(res, use.names = FALSE)
)

##                             test replications elapsed relative user.self
## 2 unlist(res, use.names = FALSE)          100   5.203    1.000     5.096
## 1                    unlist(res)          100  14.396    2.767    14.346
##   sys.self user.child sys.child
## 2    0.104          0         0
## 1    0.048          0         0

One common use of lapply is the generation of simulated observations, reading in files, or some other set of operations resulting in a data.frame. The rbindlist function, from the data.table package, gives us an excellent way to speed up the process of generating a single data.frame from the list of data.frames we start with. As an added bonus, you end up with a data.table instead

library(data.table)

datasets <- lapply(1:1000, function(x) {
  dat <- data.frame(ID = rep(x, 15), 
                    A = rnorm(15),
                    B = rnorm(15),
                    C = rnorm(15))
})

f_loop <- function(x) {
  res <- data.frame()
  for(i in seq_along(x)) {
    res <- rbind(res, x[[i]])
  }
}

benchmark(
  f_loop(datasets),
  Reduce(rbind, datasets),
  rbindlist(datasets),
  replications = 10
)

##                      test replications elapsed relative user.self sys.self
## 1        f_loop(datasets)           10   4.540   349.23     4.540        0
## 3     rbindlist(datasets)           10   0.013     1.00     0.013        0
## 2 Reduce(rbind, datasets)           10   5.474   421.08     5.474        0
##   user.child sys.child
## 1          0         0
## 3          0         0
## 2          0         0

# ## My typical pattern
# result <- lapply(datasets, function(x) {
#   # do something
# }) %>% rbindlist()

Useful Functions

Here, we introduce a couple of helpful wrapper functions. While these may or may not be useful to you in particular, they do illustrate ways to combine functions in R to create new tools that may be of use to you. One of the simplest I use is built around object.size

x <- matrix(rnorm(1e6*2), nrow = 2)
object.size(x)

## 16000216 bytes

?object.size

size_of <- function(x, units = "Mb") {
  format(object.size(x), units = units)
}

size_of(x)

## [1] "15.3 Mb"

The next two are less trivial. First, we consider a function that filters out null values from a list

## Function to remove null values from list
compact <- function(x) {
  Filter(Negate(is.null), x)
}

x <- vector("list", 5)
for(i in c(1, 3, 5)) {
  x[[i]] <- i
}

x

## [[1]]
## [1] 1
## 
## [[2]]
## NULL
## 
## [[3]]
## [1] 3
## 
## [[4]]
## NULL
## 
## [[5]]
## [1] 5

(x <- compact(x))

## [[1]]
## [1] 1
## 
## [[2]]
## [1] 3
## 
## [[3]]
## [1] 5

Now suppose first that we are running an analysis in which we fit models on collections of simulated data, some of which are destined to fail

## Simulate model failure
dummy_fit <- function() {
  u <- runif(1)
  if(u < 0.8) {
    res <- lm(Sepal.Width ~ Sepal.Length + Petal.Length, data = iris)
  } else {
    res <- NULL
  }
  return(res)
}

set.seed(69)
model_fits <- replicate(20, dummy_fit())

We can then use our compact function to remove those that failed while retaining ones that are ready for further work

length(model_fits)

## [1] 20

length(compact(model_fits))

## [1] 16

Finally, we consider the where function, a painfully absent function in R, though with a close cousin, which. which, given a vector of TRUE/FALSE values, returns indicies of those values which are TRUE. where, on the other hand, is given a vector and a logical expression, and returns a logical vector. These names probably ought to be flipped, however, which is in base R and was named first, so here we are. Note one benefit below is that which can take an expression (returning a vector of logicals) whereas where must take a function as an argument

## First which
x <- 1:10
idx <- which(x < 5)
x[idx]

## [1] 1 2 3 4

## Then where
where <- function(x, f, only.true = FALSE) {
  res <- vapply(x, f, logical(1))
  if(only.true) return(names(res[res == TRUE]))
  res
}

## Subsetting by logicals equivalent to subsetting by position
idx <- where(x, function(y) (y < 5))
x[idx]

## [1] 1 2 3 4

x[which(idx)]

## [1] 1 2 3 4

## Where is the iris dataset numeric?
(vars <- where(iris, is.numeric))

## Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
##         TRUE         TRUE         TRUE         TRUE        FALSE

## Maybe we only want where it's true
(num_vars <- where(iris, is.numeric, only.true = TRUE))

## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"

Perhaps the default option of only.true makes more sense to be set to TRUE. Or maybe you think only.true is a stupid name for an argument and you want it to change it to something else. Good news! You can, it’s your function now, do with it what you will.

One benefit to keeping only true names is to perform class specific operations

## Only vars that are numeric
(num_vars <- where(iris, is.numeric, only.true = TRUE))

## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"

mean_val <- vector("numeric", length(num_vars))
names(mean_val) <- num_vars

for(var in num_vars) {
  mean_val[var] <- mean(iris[[var]])
}

Bonus!

Here are some bonus tips that show off a bit more of what to expect with the vectorization, as implemented in R

library(data.table)

## Vector recycle
x <- 1:20
x

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

## Recycle 2, get evens
x[c(FALSE, TRUE)]

##  [1]  2  4  6  8 10 12 14 16 18 20

## Recycle 3, get multiples of 3
x[c(FALSE, FALSE, TRUE)]

## [1]  3  6  9 12 15 18

## Make column names
(col_names <- paste0("col", seq_len(10)))

##  [1] "col1"  "col2"  "col3"  "col4"  "col5"  "col6"  "col7"  "col8"  "col9" 
## [10] "col10"

## Vectorization
x <- seq_len(10)
y <- NA*x
z <- NA*x

## Not vectorized :(
if(x < 5) {
  y <- x
} else {
  y <- 5
}

## Warning in if (x < 5) {: the condition has length > 1 and only the first element
## will be used

print(y)

##  [1]  1  2  3  4  5  6  7  8  9 10

## Vectorized!
ifelse(x < 5, z <- x, z <- 5)

##  [1] 1 2 3 4 5 5 5 5 5 5

x <- c(TRUE, TRUE, FALSE)
y <- c(TRUE, FALSE, FALSE)

## Only inspects first element
x && y

## [1] TRUE

x || y

## [1] TRUE

## Inspects all elements
x & y

## [1]  TRUE FALSE FALSE

x | y

## [1]  TRUE  TRUE FALSE

## Sure would be nice if this worked...
# USArrests["Murder" > 13.2, ]

usa <- as.data.table(USArrests, keep.rownames = TRUE)
head(usa)

##            rn Murder Assault UrbanPop Rape
## 1:    Alabama   13.2     236       58 21.2
## 2:     Alaska   10.0     263       48 44.5
## 3:    Arizona    8.1     294       80 31.0
## 4:   Arkansas    8.8     190       50 19.5
## 5: California    9.0     276       91 40.6
## 6:   Colorado    7.9     204       78 38.7

## Here we are subsetting by logical vectors
usa[Murder > 13.1 & Assault > 230, ]

##                rn Murder Assault UrbanPop Rape
## 1:        Alabama   13.2     236       58 21.2
## 2:        Florida   15.4     335       80 31.9
## 3:      Louisiana   15.4     249       66 22.2
## 4:    Mississippi   16.1     259       44 17.1
## 5: South Carolina   14.4     279       48 22.5

## And again, but with a different logical vector
usa[Murder > 13.1 && Assault > 230, ]

##                 rn Murder Assault UrbanPop Rape
##  1:        Alabama   13.2     236       58 21.2
##  2:         Alaska   10.0     263       48 44.5
##  3:        Arizona    8.1     294       80 31.0
##  4:       Arkansas    8.8     190       50 19.5
##  5:     California    9.0     276       91 40.6
##  6:       Colorado    7.9     204       78 38.7
##  7:    Connecticut    3.3     110       77 11.1
##  8:       Delaware    5.9     238       72 15.8
##  9:        Florida   15.4     335       80 31.9
## 10:        Georgia   17.4     211       60 25.8
## 11:         Hawaii    5.3      46       83 20.2
## 12:          Idaho    2.6     120       54 14.2
## 13:       Illinois   10.4     249       83 24.0
## 14:        Indiana    7.2     113       65 21.0
## 15:           Iowa    2.2      56       57 11.3
## 16:         Kansas    6.0     115       66 18.0
## 17:       Kentucky    9.7     109       52 16.3
## 18:      Louisiana   15.4     249       66 22.2
## 19:          Maine    2.1      83       51  7.8
## 20:       Maryland   11.3     300       67 27.8
## 21:  Massachusetts    4.4     149       85 16.3
## 22:       Michigan   12.1     255       74 35.1
## 23:      Minnesota    2.7      72       66 14.9
## 24:    Mississippi   16.1     259       44 17.1
## 25:       Missouri    9.0     178       70 28.2
## 26:        Montana    6.0     109       53 16.4
## 27:       Nebraska    4.3     102       62 16.5
## 28:         Nevada   12.2     252       81 46.0
## 29:  New Hampshire    2.1      57       56  9.5
## 30:     New Jersey    7.4     159       89 18.8
## 31:     New Mexico   11.4     285       70 32.1
## 32:       New York   11.1     254       86 26.1
## 33: North Carolina   13.0     337       45 16.1
## 34:   North Dakota    0.8      45       44  7.3
## 35:           Ohio    7.3     120       75 21.4
## 36:       Oklahoma    6.6     151       68 20.0
## 37:         Oregon    4.9     159       67 29.3
## 38:   Pennsylvania    6.3     106       72 14.9
## 39:   Rhode Island    3.4     174       87  8.3
## 40: South Carolina   14.4     279       48 22.5
## 41:   South Dakota    3.8      86       45 12.8
## 42:      Tennessee   13.2     188       59 26.9
## 43:          Texas   12.7     201       80 25.5
## 44:           Utah    3.2     120       80 22.9
## 45:        Vermont    2.2      48       32 11.2
## 46:       Virginia    8.5     156       63 20.7
## 47:     Washington    4.0     145       73 26.2
## 48:  West Virginia    5.7      81       39  9.3
## 49:      Wisconsin    2.6      53       66 10.8
## 50:        Wyoming    6.8     161       60 15.6
##                 rn Murder Assault UrbanPop Rape

## What do those vectors look like?
attach(usa) # <- adds var names of usa to global environment

## Returns appropriate length
Murder > 13.1 & Assault > 230

##  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE

## Recycles length one vector, selecting all columns
Murder > 13.1 && Assault > 230

## [1] TRUE

Sources

Basically everything from this document came from the ideas or work of others. In particular, excellent references are Advanced R by Hadley Wickham and R Inferno by Patrick Burns. Other sources include R documentation, the R Internals Manual, and Stack Exchange.