Question 1

This problem involves constructing a dataset to organize observations for future study.

Begin by downloading here a compressed folder containing images of fungi growth started in petri dishes. While the images themselves will be processed at a later date, for now we need to create a record of current observations. This will involve constructing a dataset that has the following variables:

All of this information is recorded in the file or path name of the images (the image files are empty). Your submission should include the code to generate this database, along with the use of head() to show the first 10 rows.

Question 2

Bootstrapping is a statistical re-sampling technique that is used to construct confidence intervals for a given statistic. The general idea of the bootstrap is that from a given sample, we can construct “new” samples by simpling resampling with replacement from the original. The image below demonstrates how a single sample can be used to produce several other bootstrap samples

The general process works like this:

  1. Begin with a vector x. This could be the column of a data.frame (e.g., df$x)
  2. Decide on a statistic statistic that we wish to bootstrap. This should be a function of x
  3. Decide on a number of bootstraps samples, B. A typical value is B = 1000, but this should be able to change
  4. For each bootstrap, we should collect a sample() with replacement from the vector x the same length as x. See ?sample()
  5. From each sample, we should find the value of stat(x) and record it. We will end up with a length B vector of statistics
  6. Return this vector as a data.frame

From this vector of statistics, we should then:

Here is an example of using a bootstrap function to find the trimmed mean of a vector

## Here is my data
set.seed(123)
x <- rnorm(n = 25, mean = 10, sd = 2)

## Here is trimmed mean function
trim_mean <- function(x) {
  mean(x, trim = 0.1)
}

## Calling my bootstrap function with 3 arguments, returns a data.frame
boot <- bootstrap(x = x, statistic = trim_mean, B = 500)

## Find mean and 90% confidence interval
mean(boot$Statistic)
## [1] 9.8558
quantile(boot$Statistic, probs = c(0.05, 0.95))
##      5%     95% 
##  9.2312 10.5353

By returning a data.frame, we are making it easy to quickly plot the sampling distribution of a statistic.

library(ggplot2)
ggplot(boot, aes(Statistic)) + 
  geom_histogram(color = "black", fill = "gray80", bins = 12) +
  ggtitle("Sampling distribution of trimmed mean")

## Load data
rain <- read.csv("https://collinn.github.io/data/grinnell_rain.csv")

## Subset
set.seed(10)
idx <- sample(1:nrow(rain), size = 20)
rainsub <- rain[idx, ]

Question 3

Will we get to lists?