# An example of Cubist for prediction in R

####
*Kirk Mettler*

####
*July 13, 2015*

I was impressed by how quickly it ran and how good the the results were.

Below is the code of how I did the work using the Caret and Cubist Package in R.

First, I added the Caret Pacakage and the Cubist Package:

`require(caret)`

```
## Loading required package: caret
## Loading required package: lattice
## Loading required package: ggplot2
```

`require(Cubist)`

`## Loading required package: Cubist`

Then I read in the data set. Here is the summary of the structure as well```
predictors <- read.csv("trainPredictors.csv")
predictors <-predictors[,-1]
outcomes <- read.csv("trainOutcomes.csv")
outcomes<- outcomes[,-1]
dim(predictors)
```

`## [1] 5000 254`

I used caret to make a training and test set of the data. I chose this to be a 80/20 split. I also split out the outcomes from the predictors in both the training and test set```
inTrain<-createDataPartition(y = outcomes, p= .80)
inTrain<-unlist(inTrain)
trainpredictors<-predictors[inTrain,]
trainoutcomes<-outcomes[inTrain]
testpredictors<-predictors[-inTrain,]
testoutcomes<-outcomes[-inTrain]
```

Then I simply ran the model. Notice how quickly it ran`modelTree<- cubist(x = trainpredictors,y = trainoutcomes)`

Next I used that model to do a prediction on the test set`mtPred<-predict(modelTree,testpredictors)`

Finally I did an R^2 measure to see how it did`cor(mtPred,testoutcomes)^2`

`## [1] 0.840342`

This is great result for not much effort!
## No comments:

## Post a Comment