Researchers recruited 451 patients with a high level of cardiovascular risk. They split these patients into two groups, a treated group that received stents (small mesh tube placed inside of vulnerable arteries) and medical management (medications, lifestyle coaching, etc.), and a control group that only received medical management. Of the 224 patients in the treatment group, 45 suffered a stroke within the first year of the study, while only 28 patients in the control group had a stroke during this time.
The problem includes a dataset with 144 cats, included with each observation is the sex of the cat, as well as body weight (kg) and heart weight (g).
## Read in cat data
cats <- read.csv("https://collinn.github.io/data/cats.csv")
Part A Using dplyr operations, create a summary of the average body and heart weight for male and female cats, as well as the number of observations for each sex.
Part B Consider a linear model predicting heart weight with body weight that has the following coefficients:
lm(Hwt ~ Bwt, cats)
##
## Call:
## lm(formula = Hwt ~ Bwt, data = cats)
##
## Coefficients:
## (Intercept) Bwt
## -0.357 4.034
Write the formula for the regression line and interpret the slope and intercept. Is the intercept meaningful in this case?
Part C Now consider the linear model which adds sex to the collection of predictors for heart weight. Is the intercept the same here as in Part B? Why has it changed, and how should it be interpreted in the updated model?
lm(Hwt ~ Bwt + Sex, cats)
##
## Call:
## lm(formula = Hwt ~ Bwt + Sex, data = cats)
##
## Coefficients:
## (Intercept) Bwt SexM
## -0.4150 4.0758 -0.0821
Part D Based on what you found in Part C, what heart weight would you predict for:
This problem involves fitting two linear regression models in which different sandwiches were laid out at a picnic and the number of ants on them after an hour was recorded. In particular, the sandwiches were differentiated by two variables:
Butter
indicates whether or not the sandwich had
butter, with two categories, "yes"
and
"no"
Bread
indicates the type of bread used. The values here
were "Multigrain"
, "Rye"
,
"White"
, and “Wholemeal
”sandwich <- read.csv("https://collinn.github.io/data/sandwich.csv")
Model 1:
The first model used whether or not butter was included as an explanatory variable and the number of ants as the response variable:
lm(Ants ~ Butter, sandwich)
##
## Call:
## lm(formula = Ants ~ Butter, data = sandwich)
##
## Coefficients:
## (Intercept) Butteryes
## 38.0 11.3
Model 2:
The second model included the type of bread on the sandwich in addition to butter:
lm(Ants ~ Butter + Bread, sandwich)
##
## Call:
## lm(formula = Ants ~ Butter + Bread, data = sandwich)
##
## Coefficients:
## (Intercept) Butteryes BreadRye BreadWhite BreadWholemeal
## 36.889 11.333 -0.167 -1.500 6.111
Use these to answer the following questions.
Part A In Model 1, which value serves as the
reference variable? Based on this, what is the average number
of ants on each sandwich according to butter status? Confirm these
averages using dplyr summaries on the sandwich
data.frame
Part B Rewrite the equation for Model one so that there reference variable has changed. What are the new coefficient values?
Part C In Model 2, what is the reference variable? Interpret the value of the intercept term.
Part D Again using dplyr summaries, along with
group_by
, find the average number of ants on each
combination of bread and butter status.
Part E Does the intercept value for the reference variable in Part C match what you found for the same combination in Part D? In your own words, explain why this is or is not the case.