Review Slides Solutions

Problem 1

Skittles candy has 5 flavors, we want to know if they are evenly distributed. We buy 20 packs, each containing 50 pieces. We then sort them by color and get the following results:

m <- matrix(c(208, 222, 181, 192, 197), nrow = 1)
colnames(m) <- c("Red", "Purple", "Yellow", "Green", "Orange")
m

##      Red Purple Yellow Green Orange
## [1,] 208    222    181   192    197

How would we test if these were evenly distributed?
What is our null hypothesis?
What do we find?

Test with null hypothesis that \(H_0: p_1 = p_2 = \dots = p_5 = 0.2\)

This gives expected value for each cell of \(N \times p_i = 1000 \times 0.2 = 200\)

Compute the statistic by hand to find \(\chi^2 = 4.91\). This will be less than our critical value for \(\alpha = 0.05\) and df = 4

chisq.test(m, p = rep(1/5, 5))

## 
##  Chi-squared test for given probabilities
## 
## data:  m
## X-squared = 4.91, df = 4, p-value = 0.3

Problem 2

The results of a clinical trial exploring the side effects of a drug alongside a placebo are given below

m <- matrix(c(57, 22, 143, 76), nrow = 2)
colnames(m) <- c("Side Effects", "None")
rownames(m) <- c("Drug", "Placebo")
m

##         Side Effects None
## Drug              57  143
## Placebo           22   76

Does treatment appear associated with onset of side effects?

First, we would find the row/column margins and use these to find expected values

m %>% addmargins()

##         Side Effects None Sum
## Drug              57  143 200
## Placebo           22   76  98
## Sum               79  219 298

To find the expected value for Drug with Side Effects, then, we would take the product of the margins, divided by total

## Expected value for drug and side effect
(200 * 79) / 298

## [1] 53.02

You can check your expected values against these solutions

chisq.test(m)[["expected"]]

##         Side Effects   None
## Drug           53.02 146.98
## Placebo        25.98  72.02

With the following \(\chi^2\) statistic

## correct = FALSE to get the value as we did in class
chisq.test(m, correct = FALSE)

## 
##  Pearson's Chi-squared test
## 
## data:  m
## X-squared = 1.24, df = 1, p-value = 0.27

Note: We didn’t discuss this (and you do not need to know this), by default, R allows for a “continuity correction” when calculating the \(\chi^2\) statistic, causing it to be different in some cases than what you find by hand. This has been corrected above to remove the automatic correction.

Problem 3

A test identifies THC from saliva

What is null?
What is Type I and Type II error?
Suppose TIE of 5% and TTE of 20%. 500 drivers, 2% intoxicated, find probability of intoxicated given postitive test

Null: Sober

Type I error - tests positive for THC when sober Type II error - tests negative for THC when stoned

You should find the following table (I rounded decimal to make it easy):

m <- matrix(c(24, 466, 8, 2), nrow = 2)
colnames(m) <- c("H0 True", "H0 False")
rownames(m) <- c("Reject", "Fail to Reject")
m

##                H0 True H0 False
## Reject              24        8
## Fail to Reject     466        2

Assuming that the test was positive (24+8), what is the probability that the driver is actually inebriated?

8/(24+8) = 25%

Problem 4

The main thing to identify here was:

Species and Petal Width (right side plot) heavily associated, so petal likely will result in smaller increase to \(R^2\)
We see that within species, as sepal width increases, so does sepal length, indicating good associated of residuals from model where all we considered was species

Review Slides Solutions

2024-12-15

Problem 1

Problem 2

Problem 3

Problem 4