STA-290 HW 7

library(ggplot2)
library(dplyr)

# Prettier graphs
theme_set(theme_bw())

Question 1

Reconsider the anorexia data that we investigated in Homework 6:

anorexia <- read.csv("https://collinn.github.io/data/anorexia.txt")

Part A: Use the mutate function to again create a variable called Diff that records the difference in pre and post weights
Part B: State the null hypothesis for testing the difference and pre and post weights for each of the groups considered in the dataset
Part C: Perform an ANOVA for the hypothesis stated in Part B. What do you conclude?
Part D: Use post-hoc testing to determine if there are any pairwise differences between these groups. How do your findings here compare with the conclusions you had in Homework 6?

Question 2

In response to several outbreaks of pertussis among newborns (in whom it is very serious and occasionally fatal), the CDC now recommends that pregnant women receive the Tdap vaccine during pregnancy. The purpose of this study was to investigate whether this new recommendation has any unintended side effects with respect to the risk of preterm birth. The average gestational period is approximately 40 weeks.

tdap <- read.delim("https://github.com/IowaBiostat/data-sets/raw/main/tdap/tdap.txt")

This dataset contains three variables: Delivery, indicating the time of delivery in weeks since inception, Vac, an indicator for whether or not a woman received the Tdap vaccine during pregnancy, and tVac, the number of weeks into the pregnancy the woman received the vaccine.

Part A: First, conduct a t-test to determine if there is a statistically significant difference in gestational times between women who did and did not receive the Tdap vaccine at level \(\alpha = 0.05\)
Part B: Now use the filter function from dplyr to create a subset of this data containing only women who received the Tdap vaccine (Vac = 1). Using the subset, create a linear regression model investigating the relationship between when the woman received the Tdap vaccine and the delivery time in weeks. What impact does vaccine time appear to have on delivery time?
Part C: Find and interpret the Multiple \(R^2\) value from your summary output in Part B. Does when a woman received a Tdap vaccine appear to be a good predictor of when her baby was delivered? Give your answer in terms of variance explained.

Question 3

This question will again consider the mtcars dataset built into R

data(mtcars)

We will be investigating the relationship between the weight of a car (independent variable) and its miles per gallon (dependent variable). In addition to this, we will also be using the number of carburetors as a second independent variable.

Part A: Create a linear model predicting mpg with the covariates wt and carb. Based on the results, does it appear that the number of carburetors has a relationship with fuel economy (mpg)?
Part B: By default, carb is stored in the dataset as an integer value. Use the mutate function to create a new variable in the mtcars dataset called carb_factor that is equal to carb_factor = fator(carb). This will turn the new variable into a categorical one instead of an integer
Part C: Create a new linear model, this time predicting mpg with wt and carb_factor. What has changed this time? Specifically, what do the covariates in the new model represent, and how is this different from what we saw in Part A? (Hint: how do the estimates for factor_carb change as the number of carbs increases?)
Part D: Based on your assessment in Part C, which of these two models do you think is more appropriate for predicting miles per gallon? In other words, does the number of carburetors appear to make more sense as a continuous variable or a categorical one?