Question 1

Suppose that an investigator sets out to test 200 null hypotheses where exactly half of them are true and half of them are not. Additionally, suppose the tests have a Type I error rate of 5% and a Type II error rate of 20%

  1. Out of the 200 hypothesis tests carried out, how many should be expect to be Type I errors?

  2. How many would be Type II errors?

  3. Of the 200 tests, how many times would the investigator correctly fail to reject the null hypothesis?

  4. Out of all of the tests in which the null hypothesis was rejected, for what percentage was the null hypothesis actually true?

Question 2

Diarrhea is a major public health concern in many underdeveloped countries, in particular for babies, of whom millions die each year from dehydration. The following data comes from a controlled double-blind study of the use of bismuth salicylate (the active ingredient in Pepto Bismol) as therapy for Peruvian infants with diarrhea, with 85 babies receiving bismuth salicylate and 84 receiving placebo. To control for body size, the outcome variable is the the ratio of the volume of stool output per kilogram of body weight (ml/kg)

diarrhea <- read.csv("https://github.com/IowaBiostat/data-sets/raw/main/diarrhea/diarrhea.txt", sep = "\t")
  1. Using ggplot, create a box plot demonstrating the distribution of outcomes for each of our two groups.

  2. Conduct a t-test against the null hypothesis that there is no difference in outcome between treatment and placebo groups.

  3. Determine a 95% confidence interval for the true difference in output between babies in the control and treatment groups. Based on this, what conclusions would you draw regarding the use of bismuth salicylate as treatment for infant diarrhea. Explain.

Question 3

The following data include the results of two interventions and a control for young female anorexia patients. Include in this data are pre and post weights for 29 individuals in Cognitive Behavioral Therapy ("CBT"), Family Treatment ("FT"), and Control ("Cont"). Although these data are paired, rather than considering the efficacy within each group, we will be interested in assessing the difference in differences between them.

anorexia <- read.csv("https://collinn.github.io/data/anorexia.txt")
  1. mutate the data set to include a new variable called Diff that is the difference between the post weight and pre weight observations.
  2. For each of the 3 pairwise group of studies (i.e., “CBT and Control” or “CBT and FT”, and “FT and Control”):
  1. Use filter to create a subset of the original data, excluding the study type that is not in the pair (i.e., for “CBT and Control”, you will exclude “FT”).
  2. Perform a two-sample t-test, looking at the Diff value you created in (1) and comparing it between Treatment types at the \(\alpha = 0.05\) level
  3. Record whether you would Reject or Fail to Reject the null hypothesis and the associated p-value.
  1. In the course of our study, we conducted three separate hypotheses, but decided each of them individually at level \(\alpha = 0.05\). Conduct the necessary adjustment to control the Family-Wise Error Rate at \(\alpha = 0.05\). How does this impact the conclusions you made in (2)?

–> –>