An example of Cubist for prediction in R
Kirk Mettler
July 13, 2015
I was impressed by how quickly it ran and how good the the results were.
Below is the code of how I did the work using the Caret and Cubist Package in R.
First, I added the Caret Pacakage and the Cubist Package:
require(caret)
## Loading required package: caret
## Loading required package: lattice
## Loading required package: ggplot2
require(Cubist)
## Loading required package: Cubist
Then I read in the data set. Here is the summary of the structure as wellpredictors <- read.csv("trainPredictors.csv")
predictors <-predictors[,-1]
outcomes <- read.csv("trainOutcomes.csv")
outcomes<- outcomes[,-1]
dim(predictors)
## [1] 5000 254
I used caret to make a training and test set of the data. I chose this to be a 80/20 split. I also split out the outcomes from the predictors in both the training and test setinTrain<-createDataPartition(y = outcomes, p= .80)
inTrain<-unlist(inTrain)
trainpredictors<-predictors[inTrain,]
trainoutcomes<-outcomes[inTrain]
testpredictors<-predictors[-inTrain,]
testoutcomes<-outcomes[-inTrain]
Then I simply ran the model. Notice how quickly it ranmodelTree<- cubist(x = trainpredictors,y = trainoutcomes)
Next I used that model to do a prediction on the test setmtPred<-predict(modelTree,testpredictors)
Finally I did an R^2 measure to see how it didcor(mtPred,testoutcomes)^2
## [1] 0.840342
This is great result for not much effort!