Problem 1

Skittles candy has 5 flavors, we want to know if they are evenly distributed. We buy 20 packs, each containing 50 pieces. We then sort them by color and get the following results:

m <- matrix(c(208, 222, 181, 192, 197), nrow = 1)
colnames(m) <- c("Red", "Purple", "Yellow", "Green", "Orange")
m
##      Red Purple Yellow Green Orange
## [1,] 208    222    181   192    197

Test with null hypothesis that \(H_0: p_1 = p_2 = \dots = p_5 = 0.2\)

This gives expected value for each cell of \(N \times p_i = 1000 \times 0.2 = 200\)

Compute the statistic by hand to find \(\chi^2 = 4.91\). This will be less than our critical value for \(\alpha = 0.05\) and df = 4

chisq.test(m, p = rep(1/5, 5))
## 
##  Chi-squared test for given probabilities
## 
## data:  m
## X-squared = 4.91, df = 4, p-value = 0.3

Problem 2

The results of a clinical trial exploring the side effects of a drug alongside a placebo are given below

m <- matrix(c(57, 22, 143, 76), nrow = 2)
colnames(m) <- c("Side Effects", "None")
rownames(m) <- c("Drug", "Placebo")
m
##         Side Effects None
## Drug              57  143
## Placebo           22   76

Does treatment appear associated with onset of side effects?


First, we would find the row/column margins and use these to find expected values

m %>% addmargins()
##         Side Effects None Sum
## Drug              57  143 200
## Placebo           22   76  98
## Sum               79  219 298

To find the expected value for Drug with Side Effects, then, we would take the product of the margins, divided by total

## Expected value for drug and side effect
(200 * 79) / 298
## [1] 53.02

You can check your expected values against these solutions

chisq.test(m)[["expected"]]
##         Side Effects   None
## Drug           53.02 146.98
## Placebo        25.98  72.02

With the following \(\chi^2\) statistic

## correct = FALSE to get the value as we did in class
chisq.test(m, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  m
## X-squared = 1.24, df = 1, p-value = 0.27

Note: We didn’t discuss this (and you do not need to know this), by default, R allows for a “continuity correction” when calculating the \(\chi^2\) statistic, causing it to be different in some cases than what you find by hand. This has been corrected above to remove the automatic correction.

Problem 3

A test identifies THC from saliva


Null: Sober

Type I error - tests positive for THC when sober Type II error - tests negative for THC when stoned

You should find the following table (I rounded decimal to make it easy):

m <- matrix(c(24, 466, 8, 2), nrow = 2)
colnames(m) <- c("H0 True", "H0 False")
rownames(m) <- c("Reject", "Fail to Reject")
m
##                H0 True H0 False
## Reject              24        8
## Fail to Reject     466        2

Assuming that the test was positive (24+8), what is the probability that the driver is actually inebriated?

8/(24+8) = 25%

Problem 4

The main thing to identify here was: