Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: An example of using Random Forest in Caret with R.

Tuesday, October 28, 2014

An example of using Random Forest in Caret with R.

Here is an example of using Random Forest in the Caret Package with R.
First Load in the required packages
require(caret)
## Loading required package: caret
## Loading required package: lattice
## Loading required package: ggplot2
require(ggplot2)
require(randomForest)
## Loading required package: randomForest
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
Read in the Training and Test Set.
training_URL<-"http://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
test_URL<-"http://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"
training<-read.csv(training_URL,na.strings=c("NA",""))
test<-read.csv(test_URL,na.strings=c("NA",""))
Then I got rid of the columns that is simply an index, timestamp or username.
training<-training[,7:160]
test<-test[,7:160]
Remove the columns that are mostly NAs. They could be useful in the model, but it is easier to cut the data.frame down and see if it gives good results
mostly_data<-apply(!is.na(training),2,sum)>19621
training<-training[,mostly_data]
test<-test[,mostly_data]
dim(training)
## [1] 19622    54
I partitioned the training set into a smaller set called training1 really to speed up the running of the model
InTrain<-createDataPartition(y=training$classe,p=0.3,list=FALSE)
training1<-training[InTrain,]
So I used caret with random forest as my model with 5 fold cross validation
rf_model<-train(classe~.,data=training1,method="rf",
                trControl=trainControl(method="cv",number=5),
                prox=TRUE,allowParallel=TRUE)
print(rf_model)
## Random Forest 
## 
## 5889 samples
##   53 predictor
##    5 classes: 'A', 'B', 'C', 'D', 'E' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## 
## Summary of sample sizes: 4711, 4712, 4710, 4711, 4712 
## 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy  Kappa  Accuracy SD  Kappa SD
##    2    1         1      0.006        0.008   
##   27    1         1      0.005        0.006   
##   53    1         1      0.006        0.007   
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was mtry = 27.
print(rf_model$finalModel)
## 
## Call:
##  randomForest(x = x, y = y, mtry = param$mtry, proximity = TRUE,      allowParallel = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 27
## 
##         OOB estimate of  error rate: 0.88%
## Confusion matrix:
##      A    B    C   D    E class.error
## A 1674    0    0   0    0     0.00000
## B   11 1119    9   1    0     0.01842
## C    0   11 1015   1    0     0.01168
## D    0    2   10 952    1     0.01347
## E    0    1    0   5 1077     0.00554
That is a pretty amazingly good model! .987 accuracy! I usually hope for something in the >.7 in my real work.

83 comments:

  1. Hi, what is the result for your test set?

    ReplyDelete
  2. Using caret for random forests is so slow on my laptop, compared to using the random forest package.
    I tried to find some information on running R in parallel. I installed the multicore package and ran the following before train():

    library(doMC)
    registerDoMC(5)

    That seems to help.

    ReplyDelete
  3. Nate, you are correct you need to add a Do package otherwise there is no parallel backend. usually those libraries come across as dependancies when you load the caret package. without them. remember caret is doing a lot of other work beside just running the random forest depending on your actual call. Also try the ranger random forest package in R. It is much faster than andy's package.

    ReplyDelete
  4. Hi NPHard,
    I tried the ranger package but some functions were not visible, such ad train and createDataPartition.
    what are their substitute in ranger?

    Thanks,

    ReplyDelete
  5. @Tita you can continue using caret with method="ranger" to build the model using ranger.

    ReplyDelete
  6. Very helpful! But still I don't really understand what mtry is doing. Is it a number of trees we are building?

    ReplyDelete
    Replies
    1. It's the number of variables tried at each node. The standard value is n/3 for regression and sqrt(n) for classification (n is the total number of variables).

      Delete
  7. Great post! I am see the programming coding and step by step execute the outputs.I am gather this coding more information. It's helpful for me my friend. Also great blog here with all of the valuable information you have.
    R Language Training in Chennai

    ReplyDelete
  8. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  9. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  10. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.


    Data Science Online Training|
    Hadoop Online Training
    R Programming Online Training|

    ReplyDelete
  11. Really cool post, highly informative and professionally written and I am glad to be a visitor of this perfect blog, thank you for this rare info!

    Data science training in Marathahalli|
    Data science training in Bangalore|
    Hadoop Training in Marathahalli|
    Hadoop Training in Bangalore|

    ReplyDelete
  12. Thank you for the informative post! Is there anyway to visualize random forest like those for CART? Thank you!

    ReplyDelete
  13. Really useful information. we are providing best data science online training from industry experts.

    ReplyDelete
  14. Your conclusion that the model is amazing is likely false as the model seems to be overfitting. The assessment of a model should never be based on training data but on a separate valdation set. Since training data was used to create the model it is given that it fits well on the same data.

    ReplyDelete
  15. informative blog thanks for providing such a great information.
    Data Science Training in Hyderabad

    ReplyDelete
  16. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

    java training in omr

    java training in annanagar

    java training in chennai

    java training in marathahalli

    java training in btm layout

    java training in rajaji nagar

    java training in jayanagar

    ReplyDelete
  17. Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.
    Big data training in tambaram
    Big data training in tambaram

    ReplyDelete
  18. Thanks for the informative article. This is one of the best resources I have found in quite some time. Nicely written and great info. I really cannot thank you enough for sharing.

    rpa training in marathahalli

    rpa training in btm

    rpa training in kalyan nagar

    rpa training in electronic city

    rpa training in chennai

    rpa training in pune

    rpa online training

    ReplyDelete
  19. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
    Devops Training in pune

    Devops Training in Chennai

    Devops Training in Bangalore

    AWS Training in chennai

    AWS Training in bangalore

    ReplyDelete
  20. Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates...

    Digital Marketing Training in Mumbai

    Six Sigma Training in Dubai

    Six Sigma Abu Dhabi

    ReplyDelete
  21. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

    rpa training in Chennai | rpa training in pune

    rpa training in tambaram | rpa training in sholinganallur

    rpa training in Chennai | rpa training in velachery

    rpa online training | rpa training in bangalore

    ReplyDelete
  22. Your story is truly inspirational and I have learned a lot from your blog. Much appreciated.
    python training in tambaram
    python training in annanagar
    python training in OMR

    ReplyDelete
  23. From your discussion I have understood that which will be better for me and which is easy to use. Really, I have liked your brilliant discussion. I will comThis is great helping material for every one visitor. You have done a great responsible person. i want to say thanks owner of this blog.

    java training in chennai | java training in bangalore


    java training in tambaram | java training in velachery

    ReplyDelete


  24. Nice blog..! I really loved reading through this article. Thanks for sharing such a amazing post with us and keep blogging...


    Best Data Science online training in Hyderabad

    Data Science training in Hyderabad

    Data Science online training in Hyderabad

    ReplyDelete
  25. Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.
    data science training in bangalore

    ReplyDelete
  26. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
    Python training in usa
    Python training in marathahalli
    Python training in pune

    ReplyDelete
  27. mytectra placement Portal is a Web based portal brings Potentials Employers and myTectra Candidates on a common platform for placement assistance.

    ReplyDelete
  28. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 
    Devops Training in pune
    Devops Training in Chennai
    Devops training in sholinganallur
    Devops training in velachery
    Devops training in annanagar
    Devops training in tambaram

    ReplyDelete
  29. A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one.I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts read this.

    Data science course in tambaram | Data Science course in anna nagar
    Data Science course in chennai | Data science course in Bangalore
    Data Science course in marathahalli | Data Science course in btm

    ReplyDelete
  30. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Good discussion. Thank you.
    Anexas
    Six Sigma Training in Abu Dhabi
    Six Sigma Training in Dammam
    Six Sigma Training in Riyadh

    ReplyDelete
  31. At this time, it seems like Word Press is the preferred blogging platform available right now. (from what I’ve read) Is that what you’re using on your blog? Great post, however, I was wondering if you could write a little more on this subject?
    Best AWS Training in Marathahalli | AWS Training in Marathahalli
    Amazon Web Services Training in Anna Nagar, Chennai |Best AWS Training in Anna Nagar, Chennai

    ReplyDelete
  32. Very interesting blog which helps me to get the in depth knowledge about the technology, Thanks for sharing such a nice blog..
    Good discussion.
    Six Sigma Training in Abu Dhabi
    Six Sigma Training in Dammam
    Six Sigma Training in Riyadh

    ReplyDelete
  33. Best R Programming Training in Bangalore offered by myTectra. India's No.1 R Programming Training Institute. Classroom, Online and Corporate training in R Programming
    r programming training

    ReplyDelete
  34. nice work keep it up thanks for sharing the knowledge.Thanks for sharing this type of information.
    digital marketing company in delhi

    ReplyDelete
  35. nice work keep it up thanks for sharing the knowledge.Thanks for sharing this type of information, it is so useful.
    block adhesive manufacturer in delhi

    ReplyDelete
  36. Best R Programming Training in Bangalore offered by myTectra. India's No.1 R Programming Training Institute. Classroom, Online and Corporate training in R Programming
    r programming training

    ReplyDelete
  37. When I initially commented, I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get several emails with the same comment. Is there any way you can remove people from that service? Thanks.

    Amazon Web Services Training in OMR , Chennai | Best AWS Training in OMR, Chennai

    Amazon Web Services Training in Tambaram, Chennai|Best AWS Training in Tambaram, Chennai

    AWS Training in Chennai |Best Amazon Web Services Training in Chennai

    ReplyDelete
  38. Some us know all relating to the compelling medium you present powerful steps on this blog and therefore strongly encourage contribution from other ones on this subject while our own child is truly discovering a great deal. Have fun with the remaining portion of the year.
    python training in tambaram
    python training in annanagar
    python training in jayanagar

    ReplyDelete
  39. Thank you for allowing me to read it, welcome to the next in a recent article. And thanks for sharing the nice article, keep posting or updating news article.
    Devops training in sholinganallur
    Devops training in velachery
    Devops training in annanagar
    Devops training in tambaram

    ReplyDelete
  40. Hmm, it seems like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog. I as well as an aspiring blog writer, but I’m still new to the whole thing. Do you have any recommendations for newbie blog writers? I’d appreciate it.
    Best Selenium Training in Chennai | Selenium Training Institute in Chennai | Besant Technologies


    ReplyDelete
  41. IOT Training in Bangalore - Live Online & Classroom
    Students are made to understand the type of input devices and communications among the devices in a wireless media.
    IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication.

    ReplyDelete
  42. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.
    nebosh course in chennai

    ReplyDelete
  43. Wow, Nice blog. Thank you so much for the efforts of this blog. Visit for
    Maldives Honeymoon Packages

    ReplyDelete
  44. Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work.
    R Programming institutes in Chennai | R Programming institutes in Chennai | R Programming Course Fees | R Programming training center in chennai

    ReplyDelete
  45. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

    DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

    Good to learn about DevOps at this time.


    devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

    ReplyDelete
  46. Thank you for sharing such great information with us. I really appreciate everything that you’ve done here and am glad to know that you really care about the world that we live in
    angularjs Training in marathahalli

    angularjs interview questions and answers

    angularjs Training in bangalore

    angularjs Training in bangalore

    angularjs Training in chennai

    automation anywhere online Training

    ReplyDelete
  47. Howdy, would you mind letting me know which web host you’re utilizing? I’ve loaded your blog in 3 completely different web browsers, and I must say this blog loads a lot quicker than most. Can you suggest a good internet hosting provider at a reasonable price?


    Best AWS Training Institute in BTM Layout Bangalore ,AWS Coursesin BTM


    Best AWS Training in Marathahalli | AWS Training in Marathahalli

    Amazon Web Services Training in Jaya Nagar | Best AWS Training in Jaya Nagar


    AWS Training in BTM Layout |Best AWS Training in BTM Layout

    AWS Training in Marathahalli | Best AWS Training in Marathahalli

    ReplyDelete
  48. Thanks For sharing such a wonderful Blog on RPA. This blog contains so much of data about RPA that anyone who is searching for RPA, its really helpful for them to grab this data from your blog on RPA. Again thank you so much for your blog on RPA.
    Thanks and Regards,
    blue prism training in chennai
    Best blue prism training in chennai
    blue prism training cost in chennai

    ReplyDelete
  49. You know what you’re talking about, why waste your intelligence on just posting videos to your blog when you could be giving us something enlightening to read?
    fire and safety course in chennai

    ReplyDelete
  50. Thanks for sharing such a nice info.I hope you will share more information like this. please keep
    on sharing!

    Article submission sites
    Guest posting sites

    ReplyDelete
  51. Amazing information,thank you for your ideas.after along time i have studied an interesting information's.we need more updates in your blog.
    Java Training in Padur
    Java Courses in Saidapet
    Java Training in Perambur
    Java Training in Bangalore

    ReplyDelete
  52. I am obliged to you for sharing this piece of information here and updating us with your resourceful guidance. Hope this might benefit many learners. Keep sharing this gainful articles and continue updating us.
    DevOps Training in Chennai
    DevOps certification Chennai
    DevOps certification
    Angularjs courses in Chennai
    Angular Training in Chennai
    Best Angularjs training in chennai

    ReplyDelete
  53. Thanks for the information your article brings. I see the novelty of your writing, I will share it for everyone to read together. I look forward to reading many articles from you.

    RPA Training | Digital Nest

    ReplyDelete