Skittles candy has 5 flavors, we want to know if they are evenly distributed. We buy 20 packs, each containing 50 pieces. We then sort them by color and get the following results:
m <- matrix(c(208, 222, 181, 192, 197), nrow = 1)
colnames(m) <- c("Red", "Purple", "Yellow", "Green", "Orange")
m
## Red Purple Yellow Green Orange
## [1,] 208 222 181 192 197
Test with null hypothesis that \(H_0: p_1 = p_2 = \dots = p_5 = 0.2\)
This gives expected value for each cell of \(N \times p_i = 1000 \times 0.2 = 200\)
Compute the statistic by hand to find \(\chi^2 = 4.91\). This will be less than our critical value for \(\alpha = 0.05\) and df = 4
chisq.test(m, p = rep(1/5, 5))
##
## Chi-squared test for given probabilities
##
## data: m
## X-squared = 4.91, df = 4, p-value = 0.3
The results of a clinical trial exploring the side effects of a drug alongside a placebo are given below
m <- matrix(c(57, 22, 143, 76), nrow = 2)
colnames(m) <- c("Side Effects", "None")
rownames(m) <- c("Drug", "Placebo")
m
## Side Effects None
## Drug 57 143
## Placebo 22 76
Does treatment appear associated with onset of side effects?
First, we would find the row/column margins and use these to find expected values
m %>% addmargins()
## Side Effects None Sum
## Drug 57 143 200
## Placebo 22 76 98
## Sum 79 219 298
To find the expected value for Drug with Side Effects, then, we would take the product of the margins, divided by total
## Expected value for drug and side effect
(200 * 79) / 298
## [1] 53.02
You can check your expected values against these solutions
chisq.test(m)[["expected"]]
## Side Effects None
## Drug 53.02 146.98
## Placebo 25.98 72.02
With the following \(\chi^2\) statistic
## correct = FALSE to get the value as we did in class
chisq.test(m, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: m
## X-squared = 1.24, df = 1, p-value = 0.27
Note: We didn’t discuss this (and you do not need to know this), by default, R allows for a “continuity correction” when calculating the \(\chi^2\) statistic, causing it to be different in some cases than what you find by hand. This has been corrected above to remove the automatic correction.
A test identifies THC from saliva
Null: Sober
Type I error - tests positive for THC when sober Type II error - tests negative for THC when stoned
You should find the following table (I rounded decimal to make it easy):
m <- matrix(c(24, 466, 8, 2), nrow = 2)
colnames(m) <- c("H0 True", "H0 False")
rownames(m) <- c("Reject", "Fail to Reject")
m
## H0 True H0 False
## Reject 24 8
## Fail to Reject 466 2
Assuming that the test was positive (24+8), what is the probability that the driver is actually inebriated?
8/(24+8) = 25%
The main thing to identify here was: