Introduction

The goal of this assignment is to use at least one of the four methods from the end of the course for making predictions. These include:

Our dataset for this will be the Pima Indian dataset included in the MASS package. This data includes 532 total records of Pima Indian women living near Phoenix, Arizona collected by the US National Institute of Diabetes and Digestive and Kidney Diseases. The outcome of interest is a categorical, indicating whether or not they have diabetes. This data is already broken into a training and testing set, see ?Pima.tr or ?Pima.te

The final writeup need not be more than 1-2 pages indicating the method used along with a brief description of it, a fitted model on the training set with an assessment of fit, along with a measure of predictive accuracy by evaluating your final model on the testing data Pima.te. You should aim to include at least one visual summary as well.

## Useful for all
library(MASS) # contains lda and Pima dataset
library(caret) # contains varImp() and confusionMatrix()

## Trees and forests
library(rpart) # single tree
library(rpart.plot)
library(randomForest) # randomForest

## Penalized regression
library(glmnet)

## GAM and ROC 
library(gam)
library(pROC)
library(plotROC)