The goal of this assignment is to use at least one of the four methods from the end of the course for making predictions. These include:
Our dataset for this will be the Pima Indian dataset included in the
MASS package. This data includes 532 total records of Pima Indian women
living near Phoenix, Arizona collected by the US National Institute of
Diabetes and Digestive and Kidney Diseases. The outcome of interest is a
categorical, indicating whether or not they have diabetes. This data is
already broken into a training and testing set, see
?Pima.tr
or ?Pima.te
The final writeup need not be more than 1-2 pages indicating the
method used along with a brief description of it, a fitted model on the
training set with an assessment of fit, along with a measure of
predictive accuracy by evaluating your final model on the testing data
Pima.te
. You should aim to include at least one visual
summary as well.
## Useful for all
library(MASS) # contains lda and Pima dataset
library(caret) # contains varImp() and confusionMatrix()
## Trees and forests
library(rpart) # single tree
library(rpart.plot)
library(randomForest) # randomForest
## Penalized regression
library(glmnet)
## GAM and ROC
library(gam)
library(pROC)
library(plotROC)