knitr::opts_chunk$set(echo = TRUE, 
                      fig.align = 'center', 
                      fig.width = 4, 
                      fig.height = 4, 
                      message = FALSE, warning = FALSE)

Creating Bar Plots

The first section will briefly cover an alternative way to create bar charts in ggplot. Typically, geom_bar works by taking the frequency of observations in each column (see ?geom_bar with the argument stat = "count") and converting these to a value for the y axis

library(ggplot2)
theme_set(theme_bw(base_size = 16))

set.seed(123)
df <- data.frame(fruit = sample(c("apple", "banana", "blueberry"), size = 10,
                                 replace = TRUE))
df
##        fruit
## 1  blueberry
## 2  blueberry
## 3  blueberry
## 4     banana
## 5  blueberry
## 6     banana
## 7     banana
## 8     banana
## 9  blueberry
## 10     apple
ggplot(df, aes(x = fruit)) +
  geom_bar(color = "black",
           fill = "gray80")

Note here that the only aesthetic we use is x, designating fruit – the total are tabulated for us. If, however, the data was already tabulated with a value for the frequency:

library(dplyr)

## Summarize the total number
df2 <- group_by(df, fruit) %>% 
  summarize(N = n())

df2
## # A tibble: 3 × 2
##   fruit         N
##   <chr>     <int>
## 1 apple         1
## 2 banana        4
## 3 blueberry     5

We can modify the behavior of geom_bar by changing the stat argument to use this value instead:

ggplot(df2, aes(x = fruit, y = N)) +
  geom_bar(stat = "identity", # makes the labels match the value in Prob
           color = "black",
           fill = "gray80")

Question 1 Using the method above, create a plot showing the PMF of \(Y \sim Bin(n = 8, \pi = 0.6)\)

Vectorization

R is often saddled with a reputation for being slow, though this is largely a consequence of misunderstanding how R is intended to be used. Whereas loops are frequently used in other languages to perform sequences of operations of vectors, R often accomplishes the same thing much more quickly with a technique known as vectorization.

Vectorization, as the name implies, works by applying functions to vectors, which constitute the most basic unit in R (for example, in other languages \(x = 3\) would be a scalar term, whereas in R it is a length one vector). Compare these below to see how it works

x <- 1:10

## Always create a return vector of correct length for loops
y <- vector("numeric", length = length(x))

## Doing this with loops
for (i in seq_along(x)) {
  y[i] <- sqrt(x[i])
}

## Doing with vectorization
z <- sqrt(1:10)

identical(z, y)
## [1] TRUE

Not only is this typically faster, it makes reading the code much easier.

In fact, we saw an example of this in the last lab with the use of dpois

dpois(x = 0:5, lambda = 2)
## [1] 0.135335 0.270671 0.270671 0.180447 0.090224 0.036089

Not all functions in R are vectorized, but many of them are. When in doubt, try it out.


Question 2: Use vectorization to create a plot of the likelihood function for \(\pi\) when \(Y\) is binomially distributed with \(Y = 6\) and \(n = 10\). A helpful function for this is seq().

Question 3: What does the function which.max() do? Use this along with what you created in Question 2 to find the MLE for \(\pi\). Does this match what you think it should be?