knitr::opts_chunk$set(echo = TRUE,
fig.align = 'center',
fig.width = 4,
fig.height = 4,
message = FALSE, warning = FALSE)
The first section will briefly cover an alternative way to create bar
charts in ggplot. Typically, geom_bar
works by taking the
frequency of observations in each column (see ?geom_bar
with the argument stat = "count"
) and converting these to a
value for the y axis
library(ggplot2)
theme_set(theme_bw(base_size = 16))
set.seed(123)
df <- data.frame(fruit = sample(c("apple", "banana", "blueberry"), size = 10,
replace = TRUE))
df
## fruit
## 1 blueberry
## 2 blueberry
## 3 blueberry
## 4 banana
## 5 blueberry
## 6 banana
## 7 banana
## 8 banana
## 9 blueberry
## 10 apple
ggplot(df, aes(x = fruit)) +
geom_bar(color = "black",
fill = "gray80")
Note here that the only aesthetic we use is x
,
designating fruit – the total are tabulated for us. If, however, the
data was already tabulated with a value for the frequency:
library(dplyr)
## Summarize the total number
df2 <- group_by(df, fruit) %>%
summarize(N = n())
df2
## # A tibble: 3 × 2
## fruit N
## <chr> <int>
## 1 apple 1
## 2 banana 4
## 3 blueberry 5
We can modify the behavior of geom_bar
by changing the
stat
argument to use this value instead:
ggplot(df2, aes(x = fruit, y = N)) +
geom_bar(stat = "identity", # makes the labels match the value in Prob
color = "black",
fill = "gray80")
Question 1 Using the method above, create a plot showing the PMF of \(Y \sim Bin(n = 8, \pi = 0.6)\)
R is often saddled with a reputation for being slow, though this is largely a consequence of misunderstanding how R is intended to be used. Whereas loops are frequently used in other languages to perform sequences of operations of vectors, R often accomplishes the same thing much more quickly with a technique known as vectorization.
Vectorization, as the name implies, works by applying functions to vectors, which constitute the most basic unit in R (for example, in other languages \(x = 3\) would be a scalar term, whereas in R it is a length one vector). Compare these below to see how it works
x <- 1:10
## Always create a return vector of correct length for loops
y <- vector("numeric", length = length(x))
## Doing this with loops
for (i in seq_along(x)) {
y[i] <- sqrt(x[i])
}
## Doing with vectorization
z <- sqrt(1:10)
identical(z, y)
## [1] TRUE
Not only is this typically faster, it makes reading the code much easier.
In fact, we saw an example of this in the last lab with the use of
dpois
dpois(x = 0:5, lambda = 2)
## [1] 0.135335 0.270671 0.270671 0.180447 0.090224 0.036089
Not all functions in R are vectorized, but many of them are. When in doubt, try it out.
Question 2: Use vectorization to create a plot of
the likelihood function for \(\pi\) when \(Y\) is binomially distributed with \(Y = 6\) and \(n =
10\). A helpful function for this is seq()
.
Question 3: What does the function
which.max()
do? Use this along with what you created in
Question 2 to find the MLE for \(\pi\).
Does this match what you think it should be?