A researcher (you) is interested in investigating the effect of mental health diagnosis during pregnancy on the prescription of opioids post-partum. It is expected that opioid prescription post-partum will vary substantially by delivery type – vaginal birth versus a Cesarean section.
In order to investigate the issue, we need to analyze insurance
claims data. To this end, we have been given access to two files: a
membership file (member.csv) and a claims file
(claims.csv). Each file contains information on a set of
insured women across calendar years 2017-2018.
The membership file is in long format with five variables: member id, year (2017, 2018), month (1-12), date of birth, and insurance plan type (HMO vs PPO). Each woman will have one line for each month/year during which she was insured.
The claims file is also in long format with four variables: member id, date of claim service, a CPT service code and an ICD diagnosis code.
Instructions to prepare our dataset are as follows:
Case inclusion for analysis
Our study is restricted to only those women who were at least 18 years old and at most 40 at the time of delivery. Additionally, to be sure that we have adequate information regarding prior mental health and subsequent opioid prescription, we require that a woman must be insured in the month of her delivery, for 3 months following her delivery, and for 9 calendar months prior to her delivery.
Primary outcome: opioid prescription
The outcome of interest is the prescription of an opioid within 90
days post-partum (a binary variable). All entries in the claims data
have been adjusted to only include post-partum prescriptions so we do
not need to worry about dates. Opioid prescriptions are identified with
the ICD code J0745
Delivery Date and Type
Delivery dates are identified with the ICD codes O80 and
O82 (letter “O”) and are stored in the claims file. These
entries serve two purposes: using the delivery date for determining the
mother’s age at the time of birth and for determining if the delivery
was a vaginal birth or a Cesarean.
Mental Health Disorder
The definition used to determine a mental health disorder is given by
the ICD codes F41.8 and F32.3. The first of
these is related to anxiety disorders, while the second is associated
with depression.
Migraine headache
Patients with a history of migraines are identified with ICD code of
G43.4. Creating an indicator for this variable (i.e.,
marking which patients had previously filed a claim related to
migraines) will be used to control for opioids that were potentially
dispensed for migraines rather than related to delivery
Exploratory Plots
Using the data that you have modified, you should create 2 or 3 plots with ggplot that investigate the relationships between our outcome and our newly constructed covariates or between the covariates themselves.
Statistical Model
We must develop a statistical model where the outcome of interest is an opioid prescription, as defined above. Specifically, we are interested in determining which characteristics are associated with increased odds of an opioid prescription. Possible covariates to consider including are:
I will provide details soon on how to construct a simple model for evaluating this.
Report
Once we have selected a final model, write a short summary of your findings. What variables were included in your model? Does your model agree with the exploratory plots that you created? Does mental health diagnosis appear to have an effect on the prescriptions of periods post-postpartum?
Here are the two data files provided
member <- read.csv("https://collinn.github.io/data/member.csv")
claims <- read.csv("https://collinn.github.io/data/claims.csv")
Along with a list of ICD codes for reference
| ICD Codes | |
|---|---|
| F32.3 \(\qquad \qquad \qquad\) | Depression |
| F41.8 | Anxiety |
| G43.4 | Migraine |
| J0745 | Opioid Prescription |
| Z34 | Pregnancy Supervision (unused) |
| O80 | Vaginal Birth |
| O82 | Cesarean section |
In no particular order:
This entire project can be done using only packages used in class
Many of these covariates can be made in any order – if you are getting stuck preparing one, try moving on to a different one
If you are running into problems, try working with a smaller subset of your data to verify that you are creating variables correctly. Here is a good candidate for doing so:
member_practice <- member[member$id == "PID240296", ]
claims_practice <- claims[claims$id == "PID240296", ]
Once you have processed all the variables you need, remove the
columns in the dataset that you don’t and use unique() to
be sure that you only have one observation per row. Your final dataset
should have approximately 4240 total subjects (Always use
>= or <= instead of < or
> for checking inequalities with dates)
Here is an example of a function that might be useful. If you understand how it works, you could modify it to do something different
library(lubridate)
happen_after_n <- function(y, m, d, n) {
m <- m + 12*(y == 2018)
event <- month(d) + 12*(year(d) == 2018)
(max(m) - max(event)) >= n
}
# How might you use that function with this?
df <- data.frame(month = rep(1:12, times = 2),
year = rep(2017:2018, each = 12),
date1 = "2017-10-10",
date2 = "2018-10-10")