Suppose that an investigator sets out to test 200 null hypotheses where exactly half of them are true and half of them are not. Additionally, suppose the tests have a Type I error rate of 5% and a Type II error rate of 20%
Out of the 200 hypothesis tests carried out, how many should be expect to be Type I errors?
How many would be Type II errors?
Of the 200 tests, how many times would the investigator correctly fail to reject the null hypothesis?
Out of all of the tests in which the null hypothesis was rejected, for what percentage was the null hypothesis actually true?
Determine if the following statements are true or false. If they are false, state how they could be corrected.
If a given test statistic is within a 95% confidence interval, it is also within a 99% confidence interval
Decreasing the value of \(\alpha\) will increase the probability of a Type I error
Suppose the null hypothesis for a proportion is \(H_0: p = 0.5\) and we fail to reject. In this case, the true population proportion is equal to 0.5
With large sample sizes, even small differences between the null and observed values can be identified as statistically significant.
A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.
Write in words the null hypothesis
What is a Type I error in this context?
What is a Type II error in this context?
Which error type is more problematic for the restaurant owner? Why?
Which error is more problematic for diners? Why?
As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant’s license? Explain your reasoning.
Diarrhea is a major public health concern in many underdeveloped countries, in particular for babies, of whom millions die each year from dehydration. The following data comes from a controlled double-blind study of the use of bismuth salicylate (the active ingredient in Pepto Bismol) as therapy for Peruvian infants with diarrhea, with 85 babies receiving bismuth salicylate and 84 receiving placebo. To control for body size, the outcome variable is the the ratio of the volume of stool output per kilogram of body weight (ml/kg)
diarrhea <- read.csv("https://github.com/IowaBiostat/data-sets/raw/main/diarrhea/diarrhea.txt", sep = "\t")
Using ggplot, create a box plot demonstrating the
distribution of outcomes for each of our two groups.
Conduct a t-test against the null hypothesis that there is no difference in outcome between treatment and placebo groups.
Determine a 95% confidence interval for the true difference in output between babies in the control and treatment groups. Based on this, what conclusions would you draw regarding the use of bismuth salicylate as treatment for infant diarrhea. Explain.
The following data include the results of two interventions and a control for cows treated with growth hormones. Included in this data are pre and post weights for 72 different cows in either a Control group, a group treated with testosterone, and a group that was treated with bST. We are interested in determining if there is evidence in clinical outcome (pre and post-weight difference) between each treatment group
cow <- read.csv("https://collinn.github.io/data/cowgrowth.csv")
Use mutate to construct a new variable,
Diff, that represents the difference between post-weight
and pre-weight observations
In comparing two groups (e.g., difference for bST and differnce in Control), what is our null hypothesis?
For each of the three pairwise differences (e.g., bST vs Control), do the following:
filter to create a subset of the data
excluding the condition that is not in the pair (for “bST vs
Control”, you would exclude “Testosterone”)Diff
variable created in (1). Perform each test at the \(\alpha = 0.05\) levelIn the course of this study, we conducted three separate hypotheses, but tested each at the \(\alpha = 0.05\) level. Conduct the necessary Bonferonni adjustment to control the Family-Wise Error Rate at level \(\alpha = 0.05\). How does that impact the conclusions you m ade in (3)?
In professional basketball games during the 2009-2010 season, when
Kobe Bryant of the Los Angeles Lakers shot a pair of free throws, 8
times he missed both, 152 times he made both, 33 times he made only the
first shot, and 37 times he made only the second. Is it possible that
the successive free throws are independent, or is there evidence to
suggest a “hot streak” effect? The data are tabulated in the
freethrow data frame below:
| Make 2nd | Miss 2nd | |
|---|---|---|
| Make 1st | 152 | 33 |
| Miss 1st | 37 | 8 |
## Code for table
freethrow <- matrix(c(152,33,37,8), nrow = 2, byrow = TRUE)
rownames(freethrow) <- c("Make 1st", "Miss 1st")
colnames(freethrow) <- c("Make 2nd", "Miss 2nd")
chisq.test() (use the
argument correct = FALSE to ensure you find the same \(\chi^2\) statistic as you did in (3))