Homework 3

Problem 1

Researchers recruited 451 patients with a high level of cardiovascular risk. They split these patients into two groups, a treated group that received stents (small mesh tube placed inside of vulnerable arteries) and medical management (medications, lifestyle coaching, etc.), and a control group that only received medical management. Of the 224 patients in the treatment group, 45 suffered a stroke within the first year of the study, while only 28 patients in the control group had a stroke during this time.

Part A Using the given information, find the odds of having a stroke for each group
Part B Find the odds ratio comparing the odds of stroke in the treatment (stent) group with the odds of a stroke in the control group. State your findings
Part C Now give the odds of not having a stroke for each group. What is the odds ratio comparing the odds of not having a stroke in the control group with the odds of not having a stroke in the treatment group. How does this compare with what you found in Part B? Explain

Problem 2

The dataset below includes information on 31 black cherry trees felled in the Allegheny National Forest, Pennsylvania. For each tree, it includes three variables, one for each diameter (in), height (ft), and volume (cubic ft)

## Cherry tree data
cherry <- read.csv("https://collinn.github.io/data/cherry.csv")

Part A Create two scatterplots of the data comparing diameter with volume and height with volume, in each case letting volume be the response variable. Based on these plots, which variable do you think would be a better predictor of volume?

Part B Create two linear models, ones for each of the plots created in Part A (that is, with volume as a response variable in both models). Based on the summary output, which of these models has a higher \(R^2\) value? Is this consistent with what you decided in Part A?

Part C Using the model with the highest \(R^2\) in Part B, write the linear equation for predicting a tree’s volume. Interpret both the slope and the intercept. Is the intercept meaningful in this case?

Problem 3

The problem includes a dataset with 144 cats, included with each observation is the sex of the cat, as well as body weight (kg) and heart weight (g).

## Read in cat data
cats <- read.csv("https://collinn.github.io/data/cats.csv")

Part A Using dplyr operations, create a summary of the average body and heart weight for male and female cats, as well as the number of observations for each sex.

Part B Create a linear model in R predicting the weight of a cat’s heart using body weight as an explanatory variable. Write the formula for the regression line and interpret the slope and the intercept. Is the intercept meaningful in this case?

Part C Create a second linear model, this time including the cat’s sex in addition to body weight. Is the intercept the same here as in Part B? Why has it changed, and how should it be interpreted in the updated model?

Part D Based on what you found in Part C, what heart weight would you predict for:

A male cat with a body weight of 3.2kg
A female cat with a body weight of 2.4kg