Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: An example of using Cubist in R for prediction

Tuesday, July 14, 2015

An example of using Cubist in R for prediction

An example of Cubist for prediction in R
Recently I was given a moderate sized data set to try and do a quick prediction on. I did not have a lot of time. When I pulled up the data set it had 5000 row and 254 predictors. 250 of the predictors were continuous and 4 were catagorical. Each column had 100s if NA , and I really did not feel like going through and using imputation. The outcome variable was continuous.I decides to use Quinlan’s Cubist in R which is a package maintained by Max Kuhn.
I was impressed by how quickly it ran and how good the the results were.
Below is the code of how I did the work using the Caret and Cubist Package in R.
First, I added the Caret Pacakage and the Cubist Package:
require(caret)
## Loading required package: caret
## Loading required package: lattice
## Loading required package: ggplot2
require(Cubist)
## Loading required package: Cubist
Then I read in the data set. Here is the summary of the structure as well
predictors <- read.csv("trainPredictors.csv")
predictors <-predictors[,-1]
outcomes <- read.csv("trainOutcomes.csv")
outcomes<- outcomes[,-1]
dim(predictors)
## [1] 5000  254
I used caret to make a training and test set of the data. I chose this to be a 80/20 split. I also split out the outcomes from the predictors in both the training and test set
inTrain<-createDataPartition(y = outcomes, p= .80)
inTrain<-unlist(inTrain)
trainpredictors<-predictors[inTrain,]
trainoutcomes<-outcomes[inTrain]
testpredictors<-predictors[-inTrain,]
testoutcomes<-outcomes[-inTrain]
Then I simply ran the model. Notice how quickly it ran
modelTree<- cubist(x = trainpredictors,y = trainoutcomes)
Next I used that model to do a prediction on the test set
mtPred<-predict(modelTree,testpredictors)
Finally I did an R^2 measure to see how it did
cor(mtPred,testoutcomes)^2
## [1] 0.840342
This is great result for not much effort!

2 comments:

  1. Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge. R Language Training in Chennai

    ReplyDelete
  2. Now people like to use powerful computer.Here you find the information on big data solution. This is call big computing.
    gclub
    gclub casino online
    จีคลับ

    ReplyDelete