Simple Linear Regression (Residuals)

Consider a fitted line on simulated data.

We can rotate this plot so that the fitted line is nearly horizontal

And compare this with what we would find on a residual plot

Multivariate Linear Regression

Let’s begin by assuming that we have an outcome \(y\) with two covariates, \(X_1\) and \(X_2\), their relationship being

\[ y = X_1 \beta_1 + X_2 \beta_2 + \epsilon \] where \(\beta_1 = 1\) and \(\beta_2 = -1\).

# Simulating our data
set.seed(122)
x1 <- rnorm(100)
x2 <- rnorm(100)
y <- x1 - x2 + rnorm(100, sd = 0.1)

df <- data.frame(x1, x2, y)

We can visualize the relationship of these three variables together using plotly along with type = "scatter3d".

Using marginal plots, we can also visualize the univariate relationships between the outcome and a particular covariate. As anticipated, \(y\) appears to have a positive association with \(X_1\) and a negative association with \(X_2\)

Suppose we start by fitting a model with only a single covariate, \(X_1\). Note that

\[ y = X_1 \beta_1 \qquad \Leftrightarrow \qquad y = X_1 \beta_1 + X_2 \times 0 \]

In other words, we can fit a line relating \(X_1\) to \(y\) and plot it in 3D by setting the \(X_2\) coordinate to zero.

Playing with the plotly graph, we see that the line demonstrates a positive association between \(X_1\) and our outcome. Before moving on to consider residuals, let’s first see if we can identify any direct association with \(X_2\) using just the plot we have.

Viewing the marginal plot of \(X_1\) and \(y\), we see that larger residuals (above the fitted line) are shaded darker, corresponding tho smaller values of \(X_2\). Conversely, smaller residuals, those below the fitted line, are shaded lighter, corresponding to larger values of \(X_2\).

We can see this same relationship if we consider the marginal plot of \(X_2\), this time coloring our points by the size of their residual:

This time we see a similar phenomenon where residuals are colored according to their magnitude. Rotating the plot and considering the relationship between \(X_2\) and the color of the points (rather than their position on the y-axis), we see a similar sort of linear relationship. Indeed, this is the same plot we would get if we simply plotted the values of \(X_2\) against the residuals of the model lm(y ~ x1)

Note on fixing x2 = 0

Logistic Regression

Logistic regression is unique in that our outcomes are binary categories, while our model is created to predict probabilities of association within those categories. As a result of this, there are two things that make interpreting logistic regression results less straightforward than linear regression results:

  1. The results are in terms of odds and probabilities rather than direct inference on the outcome \(y\)
  2. Logistic regression models are non-linear

We will examine each of these a bit more closely

Odds and Probabilities

Logistic regression is performed in terms of odds which are distinct from probabilities though are directly related to them by the equation

\[ odds = \frac{\pi}{1 - \pi} \] where \(\pi\) is our probability of success, i.e., \(P(Y = 1) = \pi\). If our probability of success were 60%, we could compute the odds directly (using fractions for simplification):

\[ \frac{0.6}{1 - 0.6} = \frac{6/10}{1 - 6/10} = \frac{6/10}{4/10} = \frac{6}{4} \] Here, we say the odds are “six to four”, or “six successes for every four losses”. Of course, we could also start with odds and convert them to probability of success.

An interesting thing occurs when we multiply our odds by a constant. In the case above, suppose that we are interested in multiplying our odds by 2, so instead of being 6:4 or “six to four”, it is now 12:4 or “twelve to four”. Converting this back to a probability, we get

\[ \frac{12}{4} = \frac{12 / (12+4)}{4 / (12+4)} = \frac{12/16}{4/16} = \frac{0.75}{0.25} \] We see that doubling the odds changed the probability of success from 60% to 75%, a net change of 15%.

Now suppose we double the odds again, this time to be 24:4

\[ \frac{24}{4} = \frac{24 / (24+4)}{4 / (24+4)} = \frac{24/28}{4/28} = \frac{0.857}{0.143} \] This time, the probability of success went from 75% to 86%, a change of 11% rather than the 15% change we saw previously.

When we perform logistic regression, we are finding constant factors (our covariates) that act multiplicatively on our odds.

Example

Logistic regression is modeled in terms of log odds, so that our covariates are written

\[ \log \left( \frac{\pi}{1-\pi} \right) = \beta_0 + X_1\beta_1 + X_2\beta_2 + \dots \] Taking the exponential on both sides will remove the log from our variable

\[ e^{\log \left( \frac{\pi}{1-\pi} \right)} = e^{\beta_0 + X_1\beta_1 + X_2\beta_2 + \dots} \] Additionally, because \(e^{x+y} = e^xe^y\), this reduces to (\(\exp\) means \(e\))

\[ \frac{\pi}{1-\pi} = \exp(\beta_0) \cdot \exp(X_1 \beta_1) \cdot \exp(X_2 \beta_2) \dots \] We now have our odds expressed in terms of the exponential of our covariates.

Now let’s suppose that we have a simplified model with only one covariate and no intercept so that our odds looks like this:

\[ \frac{\pi}{1-\pi} = \exp(X_1 \beta_1) \] What role does \(\beta_1\) play here?

When we have interpreted regression results, we interpret the value of \(\beta_1\) being some measure of changed associated with a unit change in \(X_1\). Using the fact that

\[ e^{(X_1 + 1) \beta_1} = e^{X_1 \beta_1 + \beta_1} = e^{X_1\beta_1} e^{\beta_1} \] We see that

\[ e^{(X_1 + 1) \beta_1} = e^{X_1\beta_1} e^{\beta_1} = \left(\frac{\pi}{1-\pi} \right) e^{\beta_1} \] It follows that \(e^{\beta_1}\) is the constant factor that dictates how our odds changes in response to a change in \(X_1\). If \(e^{\beta_1} = 2\), then we are precisely in our situation above when we doubled our odds. What does this say about changes in \(X\)? Is the difference between \(X\) and \(X+1\) the same as the difference between \(X+1\) and \(X+2\)?

Example

For this example, we will use the problem from the homework, looking at credit card defaults with scaled balance and income.

Default <- read.csv("https://collinn.github.io/data/default.csv")
Default$default_num <- ifelse(Default$default == "Yes", 1, 0)

ggplot(Default, aes(balance, income, color = default)) + geom_point(alpha = 0.25)

Default <- mutate(Default, 
                  balance.z = as.numeric(scale(balance)), 
                  income.z = as.numeric(scale(income)))

fits <- glm(default_num ~ balance.z + income.z, Default, family = binomial) 

First, let’s consider a plot showing the predicted probability of default as a function of the standardized credit card balance

How does the probability of default change if we increase our balance by one standard deviation?

Looking at a summary of the results, we see that both variables are highly significant, but this is otherwise nearly impossible to translate directly:

summary(fits)
## 
## Call:
## glm(formula = default_num ~ balance.z + income.z, family = binomial, 
##     data = Default)
## 
## Coefficients:
##             Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)  -6.1256     0.1876  -32.66 < 0.0000000000000002 ***
## balance.z     2.7316     0.1100   24.84 < 0.0000000000000002 ***
## income.z      0.2775     0.0665    4.17              0.00003 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2920.6  on 9999  degrees of freedom
## Residual deviance: 1579.0  on 9997  degrees of freedom
## AIC: 1585
## 
## Number of Fisher Scoring iterations: 8

Taking the exponential of the parameters will give us the constants associated with each of the covariates, telling us their multiplicative impact on the odds of default.

exp(coef(fits))
## (Intercept)   balance.z    income.z 
##   0.0021863  15.3572599   1.3198549

Here, we see that increasing the balance by a standard deviation results in a 15 times increase in the odds! By itself, though, it tells us nothing about the probability of default. To do that, we can consider the inverse logit function.

The inverse logit function is derived by solving for \(\pi\) in the expression of the odds. Doing so yields

\[ \pi = \frac{\exp(X \beta)}{1 + \exp(X\beta)} \] We can write this in R as

invlogit <- function(x) { exp(x) / (1 + exp(x)) }

Since we are using exp() inside the function, we don’t need to exponentiate our coefficients first. Going back to our model, we have \(X\beta\) values of

coef(fits)
## (Intercept)   balance.z    income.z 
##    -6.12557     2.73159     0.27752

Consider an individual who has an average income and and average credit card balance (so that balance.z = 0 and income.z = 0). Their probability of default is computed as

invlogit(-6.1255 + 0 * 2.73159 + 0 * 0.27752)
## [1] 0.0021816

As we see, the probability of default is about 0.2%. If a person with average income were to increase their balance by one standard deviation ($483 in our dataset), their probability of default would be

invlogit(-6.1255 + 1 * 2.73159 + 0 * 0.27752)
## [1] 0.032486

Compare the difference between these two with the plot above.

As could also be anticipated by the plot, the difference from 1 to 2 and 2 to 3 is even sharper.

invlogit(-6.1255 + 2 * 2.73159 + 0 * 0.27752)
## [1] 0.34022
invlogit(-6.1255 + 3 * 2.73159 + 0 * 0.27752)
## [1] 0.88788

Note on fixing income.z

Default <- read.csv("https://collinn.github.io/data/default.csv")
Default$default_num <- ifelse(Default$default == "Yes", 1, 0)

Default <- mutate(Default, 
                  balance.z = as.numeric(scale(balance)), 
                  income.z = as.numeric(scale(income)))

Default <- Default[sample(1:nrow(Default), size = 5000), ]

fits <- glm(default_num ~ balance.z + income.z, Default, family = binomial) 

x1 <- seq(min(Default$balance.z), max(Default$balance.z), length.out = 100)
x2 <- seq(min(Default$income.z), max(Default$income.z), length.out = 100)

gg <- expand.grid(balance.z = x1, 
                  income.z = x2)

yhat <- predict(fits, gg, type = "response")

yhat <- matrix(yhat, nrow = 100, ncol = 100, byrow = TRUE)


plot_ly(data = Default, type = "scatter3d", mode = "markers", 
        x = ~balance.z, y = ~income.z, z = ~default_num,  color = ~factor(default_num)) %>% 
  add_surface(x = x1, y = x2, z = yhat, colorscale = "Blues") %>% 
  add_trace(x = x1, y = rep(x2[45], 100), z = yhat[45,], mode = "lines", 
            line = list(color = "red", width = 8), inherit = FALSE) %>% 
  layout(scene = list(aspectmode = "manual", 
                      aspectratio = list(x = 1, y = 1, z = 1)), 
         title = "Inverse Logistic when income.z = 0")

Summary

We summarize with a few points:

  1. Odds and probabilities are directly related, and given one you can find the other
  2. We increase or decrease odds by multiplication rather than addition or subtraction. The result is non-linear change in probabilities
  3. Although log odds ratios are not intuitive, they are the natural product of logistic regression. Exponentiating these will gives odds ratios
  4. Odds give a sense of magnitude and direction: - Values greater than 1 increase probability of success - Values less than 1 decrease probability of success
  5. The results of logistic regression are best seen with a plot, though this is often difficult when multiple variables are included in the model.