Question 1

Suppose that an investigator sets out to test 200 null hypotheses where exactly half of them are true and half of them are not. Additionally, suppose the tests have a Type I error rate of 5% and a Type II error rate of 20%

  1. Out of the 200 hypothesis tests carried out, how many should be expect to be Type I errors?

  2. How many would be Type II errors?

  3. Of the 200 tests, how many times would the investigator correctly fail to reject the null hypothesis?

  4. Out of all of the tests in which the null hypothesis was rejected, for what percentage was the null hypothesis actually true?

Question 2

Determine if the following statements are true or false. If they are false, state how they could be corrected.

  1. If a given test statistic is within a 95% confidence interval, it is also within a 99% confidence interval

  2. Decreasing the value of \(\alpha\) will increase the probability of a Type I error

  3. Suppose the null hypothesis for a proportion is \(H_0: p = 0.5\) and we fail to reject. In this case, the true population proportion is equal to 0.5

  4. With large sample sizes, even small differences between the null and observed values can be identified as statistically significant.

Question 3

A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.

  1. Write in words the null hypothesis

  2. What is a Type I error in this context?

  3. What is a Type II error in this context?

  4. Which error type is more problematic for the restaurant owner? Why?

  5. Which error is more problematic for diners? Why?

  6. As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant’s license? Explain your reasoning.

Question 4

Diarrhea is a major public health concern in many underdeveloped countries, in particular for babies, of whom millions die each year from dehydration. The following data comes from a controlled double-blind study of the use of bismuth salicylate (the active ingredient in Pepto Bismol) as therapy for Peruvian infants with diarrhea, with 85 babies receiving bismuth salicylate and 84 receiving placebo. To control for body size, the outcome variable is the the ratio of the volume of stool output per kilogram of body weight (ml/kg)

diarrhea <- read.csv("https://github.com/IowaBiostat/data-sets/raw/main/diarrhea/diarrhea.txt", sep = "\t")
  1. Using ggplot, create a box plot demonstrating the distribution of outcomes for each of our two groups.

  2. Conduct a t-test against the null hypothesis that there is no difference in outcome between treatment and placebo groups.

  3. Determine a 95% confidence interval for the true difference in output between babies in the control and treatment groups. Based on this, what conclusions would you draw regarding the use of bismuth salicylate as treatment for infant diarrhea. Explain.

Question 5

The following data include the results of two interventions and a control for cows treated with growth hormones. Included in this data are pre and post weights for 72 different cows in either a Control group, a group treated with testosterone, and a group that was treated with bST. We are interested in determining if there is evidence in clinical outcome (pre and post-weight difference) between each treatment group

cow <- read.csv("https://collinn.github.io/data/cowgrowth.csv")
  1. Use mutate to construct a new variable, Diff, that represents the difference between post-weight and pre-weight observations

  2. In comparing two groups (e.g., difference for bST and differnce in Control), what is our null hypothesis?

  3. For each of the three pairwise differences (e.g., bST vs Control), do the following:

    1. Use filter to create a subset of the data excluding the condition that is not in the pair (for “bST vs Control”, you would exclude “Testosterone”)
    2. Perform a two-sample t-test looking at the Diff variable created in (1). Perform each test at the \(\alpha = 0.05\) level
    3. Record whether you would reject or fail to reject the null, along with the associated p-value for the test
  4. In the course of this study, we conducted three separate hypotheses, but tested each at the \(\alpha = 0.05\) level. Conduct the necessary Bonferonni adjustment to control the Family-Wise Error Rate at level \(\alpha = 0.05\). How does that impact the conclusions you m ade in (3)?

Question 6

In professional basketball games during the 2009-2010 season, when Kobe Bryant of the Los Angeles Lakers shot a pair of free throws, 8 times he missed both, 152 times he made both, 33 times he made only the first shot, and 37 times he made only the second. Is it possible that the successive free throws are independent, or is there evidence to suggest a “hot streak” effect? The data are tabulated in the freethrow data frame below:

Make 2nd Miss 2nd
Make 1st 152 33
Miss 1st 37 8
## Code for table
freethrow <- matrix(c(152,33,37,8), nrow = 2, byrow = TRUE)
rownames(freethrow) <- c("Make 1st", "Miss 1st")
colnames(freethrow) <- c("Make 2nd", "Miss 2nd")
  1. What is the null hypothesis of this experiment?
  2. Using the table provided, find a table of expected values for each cell
  3. Using your table of observed and expected values, find the \(\chi^2\) statistic associated with this table along with the degrees of freedom
  4. Using your critical value sheet, if we were to test this hypothesis at level \(\alpha = 0.05\), what conclusion would we come to regarding the independence of the first and second free throw?
  5. Confirm your decision with chisq.test() (use the argument correct = FALSE to ensure you find the same \(\chi^2\) statistic as you did in (3))