Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: An example of using Cubist in R for prediction

Tuesday, July 14, 2015

An example of using Cubist in R for prediction

An example of Cubist for prediction in R
Recently I was given a moderate sized data set to try and do a quick prediction on. I did not have a lot of time. When I pulled up the data set it had 5000 row and 254 predictors. 250 of the predictors were continuous and 4 were catagorical. Each column had 100s if NA , and I really did not feel like going through and using imputation. The outcome variable was continuous.I decides to use Quinlan’s Cubist in R which is a package maintained by Max Kuhn.
I was impressed by how quickly it ran and how good the the results were.
Below is the code of how I did the work using the Caret and Cubist Package in R.
First, I added the Caret Pacakage and the Cubist Package:
require(caret)
## Loading required package: caret
## Loading required package: ggplot2
require(Cubist)
## Loading required package: Cubist
Then I read in the data set. Here is the summary of the structure as well
predictors <- read.csv("trainPredictors.csv")
predictors <-predictors[,-1]
outcomes<- outcomes[,-1]
dim(predictors)
## [1] 5000  254
I used caret to make a training and test set of the data. I chose this to be a 80/20 split. I also split out the outcomes from the predictors in both the training and test set
inTrain<-createDataPartition(y = outcomes, p= .80)
inTrain<-unlist(inTrain)
trainpredictors<-predictors[inTrain,]
trainoutcomes<-outcomes[inTrain]
testpredictors<-predictors[-inTrain,]
testoutcomes<-outcomes[-inTrain]
Then I simply ran the model. Notice how quickly it ran
modelTree<- cubist(x = trainpredictors,y = trainoutcomes)
Next I used that model to do a prediction on the test set
mtPred<-predict(modelTree,testpredictors)
Finally I did an R^2 measure to see how it did
cor(mtPred,testoutcomes)^2
## [1] 0.840342
This is great result for not much effort!