The first type of simple linear regression (SLR) is that with a single quantitative predictor. It always has the form
\[ Y = \beta_0 + \beta_1X \]
Where:
Here we construct a model predicting penguin body mass from flipper length
penguin <- read.csv("https://collinn.github.io/data/penguins.csv")
lm(body_mass_g ~ bill_length_mm, penguin)
##
## Call:
## lm(formula = body_mass_g ~ bill_length_mm, data = penguin)
##
## Coefficients:
## (Intercept) bill_length_mm
## 388.8 86.8
Question 0: Write the equation for this model based on the output above
Question 1: Based on this, does there appear to be a positive or linear relationship between bill length and body mass?
Question 2: If a penguin’s bill length increased by 2mm, what would be the projected increase in body mass?
Question 3: What is the predicted body mass of a penguin that has a bill length of 44mm?
Question 4: What does it mean if the \(R^2\) value for this model is \(R^2 = 0.347\)?
This is just like SLR with a quantiative variable, except now our explanatory variable is categorical. It will always be of the form (e.g.,)
\[ Y = \beta_0 + \beta_1 \mathbb{1}_{B} + \beta_2 \mathbb{1}_{C} \]
Where:
Where
\[ \mathbb{1}_B = \cases{1 \quad \text{in category B} \\ 0 \quad \text{not in category B}} \] Remember: any value can serve as the reference variable
First consider the values for the variables species and sex:
with(penguin, table(species, sex))
## sex
## species female male
## Adelie 73 73
## Chinstrap 34 34
## Gentoo 58 61
Additionally, consider two separate linear models
Model 1:
lm(body_mass_g ~ species, penguin)
##
## Call:
## lm(formula = body_mass_g ~ species, data = penguin)
##
## Coefficients:
## (Intercept) speciesChinstrap speciesGentoo
## 3706.2 26.9 1386.3
Model 2:
lm(body_mass_g ~ species + sex, penguin)
##
## Call:
## lm(formula = body_mass_g ~ species + sex, data = penguin)
##
## Coefficients:
## (Intercept) speciesChinstrap speciesGentoo sexmale
## 3372.4 26.9 1377.9 667.6
Question 1: What is the reference category in Model 1?
Question 2: From Model 1, which two species appear to be the closest in mass? How can you tell?
Question 3: Rewrite the equation for Model 1 so that Chinstrap penguins are the reference value
Question 4: What is the reference variable(s) in Model 2?
Question 5: Based on Model 2, what is the predicted difference in mass between a male chinstrap penguin and a female gentoo penguin?
Question 6: The \(R^2\) value for Model 1 is \(R^2 = 0.67\), while \(R^2 = 0.84\). What does this suggest about adding sex as a variable to Model 2?