This lab will help us explore and hopefully better understand the motivation and calculations involved with Fisher’s Exact Test. We’ll begin with a very quick overview of some functions that will be useful in working through the questions to follow. For this, refrain from using any external R packages (i.e., epitools).
Matrices, by default, are generated (and stored) in column order, meaning that the indicies for a matrix start in the top left and work their way down, moving on to the next column once it finishes:
matrix(1:9, nrow = 3)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
The byrow
option allows us to specify row order
instead:
matrix(1:9, nrow = 3, byrow = TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
For binomial coefficients:
## 5 choose 2
choose(5, 2)
## [1] 10
See ?fisher.test
, in particular the argument for
alternative
Below is a 2x2 contingency table tabulating the results of a small sample study in which participants were given either Vitamin C tablets or a placebo and recorded whether or not they got sick in the month of February.
Yes | No | |
---|---|---|
Placebo | 3 | 2 |
Drug | 1 | 3 |
Use the textbook to answer the following questions:
Question 0: List the names of the people with whom you are working on this lab
Question 1: For a 2x2 table, if only \(n\), the total number of participants, is fixed, what distribution does \(n_{ij}\) follow? What if both the row and column totals are fixed? In the case, how many degrees of freedom do we have?
Question 2: Assuming that the row and column totals are fixed, what are all of the possible values that \(n_{11}\) can take for the table provided? On a separate piece of paper (or in your RMD document), write down what all of these tables will look like.
Question 3: For 2x2 tables, assumptions of independence correspond with an odds ratio of \(H_0: \theta = 1\). Write out the formula for \(\theta\) in terms of \(n_{ij}\).
Question 4: Under \(H_0\), find the probability \(P(n_{11} = 3)\) (It may be useful to create a function for the PMF).
Question 5: The \(p\) value is defined as the probability of
observing data as extreme or more so under the null hypothesis.
Find directly (that is, without using fisher.test
) the
p-value associated with the table above for \(H_A: \theta > 1\). Once you have found
it, confirm its correctness by comparing it against the results from
fisher.test
using the correct alternative hypothesis.
Question 6: Find the p-value directly again, this
time for \(H_A: \theta \not= 1\).
Explain how you found it and how this was different than in Question 5.
Confirm its correctness with fisher.test
.