Table 4.2
## Read in grouped marijuana data
mm <- read.table("http://www.stat.ufl.edu/~aa/cat/data/Marijuana.dat",
header = TRUE)
mm
## race gender yes no
## 1 white female 420 620
## 2 white male 483 579
## 3 other female 25 55
## 4 other male 32 62
fit <- glm(yes / (yes + no) ~ gender + race, weights = yes + no,
family = binomial, data = mm)
summary(fit)
##
## Call:
## glm(formula = yes/(yes + no) ~ gender + race, family = binomial,
## data = mm, weights = yes + no)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8303 0.1685 -4.93 0.00000084 ***
## gendermale 0.2026 0.0852 2.38 0.0174 *
## racewhite 0.4437 0.1677 2.65 0.0081 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 12.752784 on 3 degrees of freedom
## Residual deviance: 0.057982 on 1 degrees of freedom
## AIC: 30.41
##
## Number of Fisher Scoring iterations: 3
Note the use of pivot_wider()
library(tidyr)
ucb <- as.data.frame(UCBAdmissions)
ucb <- pivot_wider(ucb, names_from ="Admit",
values_from = "Freq")
print(ucb)
## # A tibble: 12 × 4
## Gender Dept Admitted Rejected
## <fct> <fct> <dbl> <dbl>
## 1 Male A 512 313
## 2 Female A 89 19
## 3 Male B 353 207
## 4 Female B 17 8
## 5 Male C 120 205
## 6 Female C 202 391
## 7 Male D 138 279
## 8 Female D 131 244
## 9 Male E 53 138
## 10 Female E 94 299
## 11 Male F 22 351
## 12 Female F 24 317
The first model is the homogeneous association model, namely that odds of admission for each department are the same given gender
(fit1 <- glm(Admitted / (Admitted + Rejected) ~ Gender + Dept, family = binomial,
weights = Admitted + Rejected, data = ucb)) %>% summary()
##
## Call:
## glm(formula = Admitted/(Admitted + Rejected) ~ Gender + Dept,
## family = binomial, data = ucb, weights = Admitted + Rejected)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.5821 0.0690 8.44 <0.0000000000000002 ***
## GenderFemale 0.0999 0.0808 1.24 0.22
## DeptB -0.0434 0.1098 -0.40 0.69
## DeptC -1.2626 0.1066 -11.84 <0.0000000000000002 ***
## DeptD -1.2946 0.1058 -12.23 <0.0000000000000002 ***
## DeptE -1.7393 0.1261 -13.79 <0.0000000000000002 ***
## DeptF -3.3065 0.1700 -19.45 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 877.056 on 11 degrees of freedom
## Residual deviance: 20.204 on 5 degrees of freedom
## AIC: 103.1
##
## Number of Fisher Scoring iterations: 4
Two marginal models on department and gender
(fit2 <- glm(Admitted / (Admitted + Rejected) ~ Dept, family = binomial,
weights = Admitted + Rejected, data = ucb)) %>% summary()
##
## Call:
## glm(formula = Admitted/(Admitted + Rejected) ~ Dept, family = binomial,
## data = ucb, weights = Admitted + Rejected)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.5935 0.0684 8.68 <0.0000000000000002 ***
## DeptB -0.0506 0.1097 -0.46 0.64
## DeptC -1.2091 0.0973 -12.43 <0.0000000000000002 ***
## DeptD -1.2583 0.1015 -12.40 <0.0000000000000002 ***
## DeptE -1.6830 0.1173 -14.34 <0.0000000000000002 ***
## DeptF -3.2691 0.1671 -19.57 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 877.056 on 11 degrees of freedom
## Residual deviance: 21.736 on 6 degrees of freedom
## AIC: 102.7
##
## Number of Fisher Scoring iterations: 4
(fit3 <- glm(Admitted / (Admitted + Rejected) ~ Gender, family = binomial,
weights = Admitted + Rejected, data = ucb)) %>% summary()
##
## Call:
## glm(formula = Admitted/(Admitted + Rejected) ~ Gender, family = binomial,
## data = ucb, weights = Admitted + Rejected)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.2201 0.0388 -5.68 0.000000014 ***
## GenderFemale -0.6104 0.0639 -9.55 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 877.06 on 11 degrees of freedom
## Residual deviance: 783.61 on 10 degrees of freedom
## AIC: 856.5
##
## Number of Fisher Scoring iterations: 4
anova(fit1, fit2) # indicates conditional independence of gender
## Analysis of Deviance Table
##
## Model 1: Admitted/(Admitted + Rejected) ~ Gender + Dept
## Model 2: Admitted/(Admitted + Rejected) ~ Dept
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 5 20.2
## 2 6 21.7 -1 -1.53 0.22
anova(fit1, fit3)
## Analysis of Deviance Table
##
## Model 1: Admitted/(Admitted + Rejected) ~ Gender + Dept
## Model 2: Admitted/(Admitted + Rejected) ~ Gender
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 5 20
## 2 10 784 -5 -763 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coffee <- read.csv("https://collinn.github.io/data/coffee.csv")
coffee <- pivot_wider(coffee, names_from = "cancer",
values_from = "Freq")
coffee
## # A tibble: 4 × 4
## coffee smoking yes no
## <chr> <chr> <int> <int>
## 1 yes yes 16 4
## 2 no yes 4 1
## 3 yes no 12 108
## 4 no no 15 135
Part 1 Begin by fitting a logistic regression model to cancer with only coffee as a covariate. Why does it make sense to consider this a marginal independence model?
Looking at the Wald statistic for the coefficient, does it seem that marginal independence holds?
Part 2 Based on your model for Part 1, what is the
odds ratio comparing the odds of cancer for smokers to non-smokers?
Using confint()
or the Wald statistic, is there strong
evidence that marginal independence does not hold?
Part 3 Now construct a logistic regression model with indicators for both coffee and smoking. Looking at the deviance, is there evidence for homogenous association between our covariates?
Part 4 Consider again the Wald statistic for the coffee covariate as you did in Part B. What can you say about the conditional independence of coffee and cancer, given smoking status?