library(dplyr)
library(ggplot2)
library(pROC)
theme_set(theme_bw(base_size = 16))
crab <- read.table("http://www.stat.ufl.edu/~aa/cat/data/Crabs.dat",
header = TRUE)
## Make vector of factor
crab$colorf <- factor(crab$color)
fit <- glm(y ~ width, family = binomial, data = crab)
summary(fit)
##
## Call:
## glm(formula = y ~ width, family = binomial, data = crab)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -12.351 2.629 -4.70 0.0000026 ***
## width 0.497 0.102 4.89 0.0000010 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 225.76 on 172 degrees of freedom
## Residual deviance: 194.45 on 171 degrees of freedom
## AIC: 198.5
##
## Number of Fisher Scoring iterations: 4
exp(coef(fit))
## (Intercept) width
## 0.0000043262 1.6441615976
## Make vector of factor
crab$colorf <- factor(crab$color)
fit <- glm(y ~ width + colorf, family = binomial, data = crab)
## Sample proportion crabs with sat
(prop <- sum(crab$y) / nrow(crab))
## [1] 0.64162
predval <- as.numeric(fitted(fit) > prop)
xtabs(~ crab$y + predval)
## predval
## crab$y 0 1
## 0 43 19
## 1 36 75
## Try again with pi0 = 50
predval50 <- as.numeric(fitted(fit) > 0.5)
xtabs(~ crab$y + predval50)
## predval50
## crab$y 0 1
## 0 31 31
## 1 15 96
In most cases, we could also just as well use table
table(crab$y, predval50)
## predval50
## 0 1
## 0 31 31
## 1 15 96
rocplot <- roc(y ~ fitted(fit), data = crab)
with(rocplot, cbind(thresholds, specificities, sensitivities)) %>% head()
## thresholds specificities sensitivities
## [1,] -Inf 0.000000 1.00000
## [2,] 0.076877 0.016129 1.00000
## [3,] 0.123589 0.016129 0.99099
## [4,] 0.149141 0.032258 0.99099
## [5,] 0.158344 0.048387 0.99099
## [6,] 0.188201 0.080645 0.99099
with(rocplot, cbind(thresholds, specificities, sensitivities)) %>% tail()
## thresholds specificities sensitivities
## [104,] 0.94295 1 0.045045
## [105,] 0.94844 1 0.036036
## [106,] 0.96101 1 0.027027
## [107,] 0.97254 1 0.018018
## [108,] 0.98061 1 0.009009
## [109,] Inf 1 0.000000
plot.roc(rocplot, legacy.axes = TRUE)
auc(rocplot)
## Area under the curve: 0.771
We can also use the plotROC
package which creates
slightly nicer plots with estimates for \(\pi_0\) annotated on the chart. Note the
aesthetics m
for \(\hat{y}\) and d
for \(y\).
library(plotROC)
## Put fitted values back in data.frame
crab$fitval <- fitted(fit)
ggplot(crab) + geom_roc(aes(d = y, m = fitval))
ggplot(crab, aes(color = colorf)) + geom_roc(aes(d = y, m = fitval)) +
scale_color_brewer(palette = "Set1")
This package also has its own suite of additional functions. To use them, we must assign the ggplot to a new variable
m1 <- ggplot(crab) + geom_roc(aes(d = y, m = fitval))
m2 <- ggplot(crab, aes(color = colorf)) + geom_roc(aes(d = y, m = fitval))
calc_auc(m1)
## PANEL group AUC
## 1 1 -1 0.77136
calc_auc(m2) # AUC for each level of factor
## PANEL group colorf AUC
## 1 1 1 1 0.51852
## 2 1 2 2 0.71014
## 3 1 3 3 0.83654
## 4 1 4 4 0.59048
An inconvenient note: when using the plotROC
package,
there is a bug in calc_auc
if you specify any aesthetics in
geom_roc
except for d
and m
. For
example, this code would not work because colorf
is
specified in geom_roc
instead of ggplot()
:
m2 <- ggplot(crab) + geom_roc(aes(d = y, m = fitval, color = colorf))
calc_auc(m2)
## Error in `rlang::quo_get_expr()`:
## ! `quo` must be a quosure
The code below loads game data from the Golden State Warriors’ 2015-2016 season. The outcome of the game will be our response.
gsw <- read.csv("https://remiller1450.github.io/data/GSWarriors.csv")
gsw$win_binary <- ifelse(gsw$Win == "W", 1, 0) # response var
Part A Fit a logistic regression model using only
FG3A
as a predictor. This variable represents the number of
three-point shots attempted in a game. Interpret the fitted
coefficient
Part B Fit a logistic model again, this time using
both FG3A
(attempts) and FG3
where
FG3
represents the number of attempts actually
made. Examine the coefficients and explain why the sign of
FG3A
has changed. (Recall: coefficients are interpreted in
isolation, assuming all other values are held constant)
Part C Create ROC curves associated with each of the models created.