Cubist is a machine learning algorithm for continous outcomes. Cubist is a rule-based decision tree that automatically deals with missing values. This makes using Cubist ideal for baselining the perdictive value of your data set because if it is messy with a lot missing values, you do not have to deal with it. Cubist has become my first-try model for all continous outcome data sets.
Cubist was developed by Quinlan, and the R package for Cubist is maintained by Max Kuhn who also maintains the Caret package.
The code for calling a Cubist model is fairly standard for most predictive models in R.
There are some other elements that help improve the basic Cubist model’s performance, but let’s start with the simple model and go from there. For this example, we are going to use the BostonHousing data set the is contained in the mlbench package. The Data comes from a 1978 paper by Harrison and Rubinfeld (“Hedonic Prices and the Demand for Clean Air,” Journal of Environmental Economics and Management, vol. 5, 1978, pp. 81-102). It is a very well-know data set with 506 rows and 19 variables. Let’s look at that data set before we move on to creating and evaluating a predictive model in R.
crim crime rate of town zn proportion of residential land zoned for lot over 25,000 sq.ft. indus proportion of non-retail business acres per town chas Charles River Dummey Variable ( = 1 if tract bounds Charles River, = 0 if not) nox nitrix oxides concentration in parts per 10 million rm average number of rooms per dwelling age proportion of owner occupied units built before 1940 dis weighted distances to five Boston Employment centers rad index of accessibility to radial highways tax full value property tax per USD 10,000 ptratio pupil to teacher ratio per town b 1000(B-0.63)^2 where B is the proportion of African Americans in the town lstat percentage of lower status of the population medv median value of owner-occupied homes in USD 1000’s
Normally when you build a predictive model, you break that data set into two or three data sets - training, test, and hold out data set. That may differ slightly if you are using cross-validation, but in general I make a training and a test set. Here I will use an 80/20 split . I am also going to do a little modification to the chas variable
So fit the model
Model Committes are created by generating a rule-based sequence of models similar to boosting. The number of committees can range from 1 to 100.
Let’s do a committee Cubist model with committees set to 100
Cubist was developed by Quinlan, and the R package for Cubist is maintained by Max Kuhn who also maintains the Caret package.
The code for calling a Cubist model is fairly standard for most predictive models in R.
cubist( x= trainingpredictors, y = trainingoutcomes)
There are some other elements that help improve the basic Cubist model’s performance, but let’s start with the simple model and go from there. For this example, we are going to use the BostonHousing data set the is contained in the mlbench package. The Data comes from a 1978 paper by Harrison and Rubinfeld (“Hedonic Prices and the Demand for Clean Air,” Journal of Environmental Economics and Management, vol. 5, 1978, pp. 81-102). It is a very well-know data set with 506 rows and 19 variables. Let’s look at that data set before we move on to creating and evaluating a predictive model in R.
require(mlbench)
## Loading required package: mlbench
require(caret)
## Loading required package: caret
## Loading required package: lattice
## Loading required package: ggplot2
require(Cubist)
## Loading required package: Cubist
data(BostonHousing)
dim(BostonHousing)
## [1] 506 14
str(BostonHousing)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : num 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ b : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
As you can, see it is a data set with 506 rows and 19 columns of all numeric values. We are going to try to predict the value of the last column (medv) which is the median value of owner-occupied homes in USD 1000’s. Here is a description of the data in each of the 19 columns.crim crime rate of town zn proportion of residential land zoned for lot over 25,000 sq.ft. indus proportion of non-retail business acres per town chas Charles River Dummey Variable ( = 1 if tract bounds Charles River, = 0 if not) nox nitrix oxides concentration in parts per 10 million rm average number of rooms per dwelling age proportion of owner occupied units built before 1940 dis weighted distances to five Boston Employment centers rad index of accessibility to radial highways tax full value property tax per USD 10,000 ptratio pupil to teacher ratio per town b 1000(B-0.63)^2 where B is the proportion of African Americans in the town lstat percentage of lower status of the population medv median value of owner-occupied homes in USD 1000’s
Normally when you build a predictive model, you break that data set into two or three data sets - training, test, and hold out data set. That may differ slightly if you are using cross-validation, but in general I make a training and a test set. Here I will use an 80/20 split . I am also going to do a little modification to the chas variable
BostonHousing$chas <- as.numeric(BostonHousing$chas) - 1
set.seed(1)
inTrain <- sample(1:nrow(BostonHousing), floor(.8*nrow(BostonHousing)))
trainingPredictors <- BostonHousing[ inTrain, -14]
testPredictors <- BostonHousing[-inTrain, -14]
trainingOutcome <- BostonHousing$medv[ inTrain]
testOutcome <- BostonHousing$medv[-inTrain]
Now all we have to do is fit the model, make a prediction and then evaluate the prediction. Since we are predicting a continous variable here, we will use Root Mean Squared Error (RSME).So fit the model
trainingPredictors <- BostonHousing[ inTrain, -14]
testPredictors <- BostonHousing[-inTrain, -14]
trainingOutcome <- BostonHousing$medv[ inTrain]
testOutcome <- BostonHousing$medv[-inTrain]
modelTree <- cubist(x = trainingPredictors, y = trainingOutcome)
modelTree
##
## Call:
## cubist.default(x = trainingPredictors, y = trainingOutcome)
##
## Number of samples: 404
## Number of predictors: 13
##
## Number of committees: 1
## Number of rules: 4
Look at the modelsummary(modelTree)
##
## Call:
## cubist.default(x = trainingPredictors, y = trainingOutcome)
##
##
## Cubist [Release 2.07 GPL Edition] Wed Feb 17 21:19:55 2016
## ---------------------------------
##
## Target attribute `outcome'
##
## Read 404 cases (14 attributes) from undefined.data
##
## Model:
##
## Rule 1: [88 cases, mean 13.81, range 5 to 27.5, est err 2.10]
##
## if
## nox > 0.668
## then
## outcome = 2.07 + 3.14 dis - 0.35 lstat + 18.8 nox + 0.007 b
## - 0.12 ptratio - 0.008 age - 0.02 crim
##
## Rule 2: [153 cases, mean 19.54, range 8.1 to 31, est err 2.16]
##
## if
## nox <= 0.668
## lstat > 9.59
## then
## outcome = 34.81 - 1 dis - 0.72 ptratio - 0.056 age - 0.19 lstat + 1.5 rm
## - 0.11 indus + 0.004 b
##
## Rule 3: [39 cases, mean 24.10, range 11.9 to 50, est err 2.73]
##
## if
## rm <= 6.23
## lstat <= 9.59
## then
## outcome = 11.89 + 3.69 crim - 1.25 lstat + 3.9 rm - 0.0045 tax
## - 0.16 ptratio
##
## Rule 4: [128 cases, mean 31.31, range 16.5 to 50, est err 2.95]
##
## if
## rm > 6.23
## lstat <= 9.59
## then
## outcome = -1.13 + 1.6 crim - 0.93 lstat + 8.6 rm - 0.0141 tax
## - 0.83 ptratio - 0.47 dis - 0.019 age - 1.1 nox
##
##
## Evaluation on training data (404 cases):
##
## Average |error| 2.27
## Relative |error| 0.34
## Correlation coefficient 0.94
##
##
## Attribute usage:
## Conds Model
##
## 78% 100% lstat
## 59% 53% nox
## 41% 78% rm
## 100% ptratio
## 90% age
## 90% dis
## 62% crim
## 59% b
## 41% tax
## 38% indus
##
##
## Time: 0.0 secs
Make a predictionmtPred <- predict(modelTree, testPredictors)
Get the RMSEsqrt(mean((mtPred - testOutcome)^2))
## [1] 3.337924
That is not bad, but we can do better using Committees and NeighborsModel Committes are created by generating a rule-based sequence of models similar to boosting. The number of committees can range from 1 to 100.
Let’s do a committee Cubist model with committees set to 100
set.seed(1)
committeeModel <- cubist(x = trainingPredictors, y = trainingOutcome, committees = 100)
## Get RMSE of COmmittee
comPred <- predict(committeeModel, testPredictors)
## RMSE
sqrt(mean((comPred - testOutcome)^2))
## [1] 2.779002
Now let’s add neighbors to the committees, which adjusts the model based adjacent solutions.instancePred <- predict(committeeModel, testPredictors, neighbors = 4)
sqrt(mean((instancePred - testOutcome)^2))
## [1] 2.566348
So now the question is, what combination of committees and neighbors yields the best prediction? We can answer that by creating a vector of possible committees, and a vector of possible neighbors, then seeing where the RSME is best.set.seed(1)
cTune <- train(x = trainingPredictors, y = trainingOutcome,"cubist",
tuneGrid = expand.grid(.committees = c(1, 10, 50, 100),
.neighbors = c(0, 1, 5, 9)),
trControl = trainControl(method = "cv"))
cTune
## Cubist
##
## 404 samples
## 13 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 363, 363, 363, 363, 362, 365, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared RMSE SD Rsquared SD
## 1 0 4.081800 0.7916640 1.3007653 0.15005686
## 1 1 4.111955 0.7950087 1.2540113 0.13995896
## 1 5 3.943515 0.8054412 1.2727587 0.14070680
## 1 9 3.959522 0.8022459 1.3305391 0.14884521
## 10 0 3.371765 0.8597818 0.9354412 0.08111265
## 10 1 3.370218 0.8681521 0.8462733 0.07253983
## 10 5 3.168392 0.8767757 0.9409569 0.07777561
## 10 9 3.207153 0.8725973 0.9499315 0.07980860
## 50 0 3.238911 0.8704658 0.9819922 0.08369843
## 50 1 3.257555 0.8741483 0.9284914 0.08006349
## 50 5 3.035711 0.8845178 1.0167411 0.08284853
## 50 9 3.071004 0.8810091 1.0233749 0.08444221
## 100 0 3.211165 0.8713608 1.0185290 0.08500905
## 100 1 3.254918 0.8739276 0.9853192 0.08458200
## 100 5 3.005851 0.8855715 1.0492541 0.08529563
## 100 9 3.044205 0.8820627 1.0572761 0.08671512
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 100 and neighbors
## = 5.
As you can see, Cubist does really well as a predictive model.
Gene Williams doesn't plan out of a ample assembly facility, nor does replica watches UK he accept the latest accouterment on duke if it comes to architecture things like the Board Track Replica. What he does accept is an affectionate ability of motorcycles and the ability to accumulate his own creations.Board-track antagonism (so called because the egg-shaped advance were fabricated of board planks) was bell ross replica accepted in the aboriginal canicule of motorcycling as a analysis omega replica of both acceleration and endurance. To body this replica of a 1930s board-track racer, Williams started with a Harley-Davidson Knucklehead engine that he apathetic out and adapted with a high-performance cam and bifold replica rolex carburetors. A 1931 Harley VL archetypal donated its frame, which hadto be continued by an inch to fit the beyond engine. (The VL came with a flathead V-twin, which is added compact-but
ReplyDeleteTruely a very good article on how to handle the future technology. After reading your post,thanks for taking the time to discuss this, I feel happy about and I love learning more about this topic. keep sharing your information regularly for my future reference. This content creates a new hope and inspiration with in me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks.
ReplyDeleteAndroid training in chennai
Great post. This article is really very interesting and enjoyable. I think it’s must be helpful and informative for us. Thanks for sharing your nice post. study in australia consultant in jalandhar
DeleteWe experience more than happy about that. You should also consider their finest game titles with our free currently to find the sense of exciting that brings.
ReplyDeletehappy wheels demo | friv4school | girls go games | games 2 girls | happy wheels 2
Here I want to show you how to get a beautiful picture, if you are not a professional photographer. You try to use this app and give me feedback!
ReplyDeleteb612 , b612 , b612 , b612 , b612 , b612 , b612 , b612 , b612 , b612 , b612 , b612 , b612 , b612 , b612
To take a photograph is beautiful, there are many important factors. In particular, the use of applications that contributes most to make the shimmering picture
ReplyDeleteretrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica , retrica
Nice! thank you so much! Thank you for sharing. Your blog posts are more interesting and impressive. I think there are many people like and visit it regularly, including me.I actually appreciate your own position and I will be sure to come back here.
ReplyDeletefacebook baixar l baixar facebook l baixar facebook gratis l facebook movel l facebook movel baixar
Your blog posts are more interesting and impressive. I think there are many people like and visit it regularly, including me.I actually appreciate your own position and I will be sure to come back here.
ReplyDeletehotmail login l login hotmai l hotmail log in l hotmail sign in l hotmail sign up
شهر استانبول ،شهر تاریخی و شهر جذاب کشور ترکیه است که در قالب تور استانبولسالانه پذیرای تعداد زیادی از گردشگران است.اگر قصد سفر به کشور ترکیه را دارید خواهشمندیم برای رزرو تور استانبول با ما در تماس باشید.
ReplyDeleteThanks for this. I was thinking about buying your books through the bookbud offer. This post has had me decide to not do so. Have a good day.
ReplyDeletegclub
gclub casino online
จีคลับ
i like what you share bro , thank you keep going hguhf
ReplyDeleteالعاب حرب
Many thanks for sharing pianotiles2.com
ReplyDeleteشركة مكافحة حشرات
ReplyDeleteتهتم شركة قمم التميز باعمال الرش والقضاء على الحشرات المنزلية فمهما كانت المعاناة ومهما كانت كمية الحشرات التى تعانى منها فتعاون مع افضل شركة تهتم بهذه الخدمة الان ارخص شركة مكافحة حشرات بالجبيل
الحشرات المنزلية من المشكلات التى تعانى منها البيوت ، وخصوص فى المناطق المرتفعة فيها الحرارة ، وحيث تنشر الحشرات فى المنزل وقد تسبب متاعب كثيرة وأمراض خطيرة .
وللحماية منزلك من الحشرات ومنع دخولها ويمكن ان نطرح بعض النصائح :
*الحرص على النظافة الدائمة للمنزل ،
*التهوية الجيدة لغرف المنزل ودخول أشعة الشمس لقتل الحشرات التى لا ترى إلا بالعين المجردة . مكافحة النمل الابيض بالجبيل
*التخلص من القمامة أول بأول حتى لا تكون عرضه للأنتشار الحشرات .
نتظيف خزانات الطعام وتهويتها جيدة، والتخلص من الفضلات ، وغلق المحكم للبرطمانات للطعام التى تواجد فى خزانة حتى لا تكون مصدر لتسرب الحشرات .
*وضع سلك شبكة صغير الحجم على النوافذ والأبواب ، وسد الثقوب والشقوق بالأسمنت اوجبس ،لكى لا تتسرب الحشرات منها .
*تنظيف الأطباق وأوانى الطعام بعد استعمالها مباشرة ، لأنها قد تسبب فى خروج الحشرات من المكان التى تعيش فيه .
*عدم ترك فضلات الطعام على مائدة الطعام ، وتنظيف بصفة دورية تمنع من أنتشار الحشرات والوقاية منها .
وهناك حشرات متعددة قد تسبب للأنسان أزعاج دائم ومنها : النمل والصراصير والبق والذباب والناموس والفئران .
النمل نوعان * النمل عادى مصدر غذائه فضلات الطعام ويعيش فى ثقوب وشقوق المنزل . شركة مكافحة النمل الابيض بالجبيل
والنمل الأبيض يسمى ( العتة) وهو يحتاج إلى طعام الدائم لكى يبقى على قيد الحياة ، مصدر غذائه السكر (الجلكوز) الموجود فى الخشب ، و المتواجد فى الأبواب والنوافذ والأثاث ، وقد يسبب خطر كبير على منزلك فالنمل الأبيض يعمل ممرات ويقوم بتأسيس بيت له أسفل المنزل ، ويكون دمار بمرور الوقت وأضرار فادحة لايمكن اصلاحها إلا بعد فوات الآوان . شركة رش مبيدات بالجبيل
ويمكن التخلص من النمل الأبيض برش مبيدات كيمائية مخصص لها أثناء بناء الأثاث فى الأرض ، وعند تركيب الأبواب التأكيد من أغلاق جميع الفتحات فى جوانب الباب وألأفضل تركيب الأبواب بالمفاصل وليس بالمواد اللزقة ، وعند شراء الأثاث الضغط على الخشب وتأكد مدى قوتها وعدم أصابتها بالنمل الأبيض .
أما النمل العادى التى يتواجد المطبخ يمكن التخلص منه بقليل من الصودا المخلوطة بالسكر وضعها فى الثقوب والفتحات التى تخرج منها فالنمل يموت فى الحال .
فى خزانة الطعام نضع فيها قليل من القهوة المرة أو الفلفل غير المطحون فهو يخلصك من النمل نهائيا وكذلك الحشرات الأخرى .
Your post is so useful. Thanks! friv4schoolonline.net
ReplyDeleteThanks for all your information, Website is very nice and informative content.
ReplyDeletelennyfacetext.com
Not all are true. Everyone has their own way of thinking but I think they have to reconsider. I like to argue for the most accurate results.
ReplyDeletehttp://fivenightsatfreddysplay.com
I was very impressed by this post, this site has always been pleasant news. Thank you very much for such an interesting post. Keep working, great job! In my free time, I like play game: vex3game.com. What about you?
ReplyDeleteThe article you have shared here very awesome. I really like and appreciated your work. I read deeply your article, the points you have mentioned in this article are useful
ReplyDeletewww.appbaixar.com
Like To Read your Post Because Your Post Is very interesting thank you so much give us information keep posting good luck
ReplyDeletehttp://www.poojaroy.com/aerocity-escorts.html
http://www.poojaroy.com/gurgaon-escorts.html
http://www.poojaroy.com/dwarka-escorts.html
http://www.poojaroy.com/delhi-call-girls.html
http://www.poojaroy.com/noida-escorts.html
http://www.poojaroy.com/vasant-vihar-escort-leena.html
http://www.poojaroy.com/saket-escorts.html
http://www.poojaroy.com/greater-kailash-escorts.html
http://www.poojaroy.com/lajpat-nagar-escorts.html
l ike to read your post thanks you so very much
ReplyDeletehttp://www.iandmystory.com/
Like To Read your Post Because Your Post Is very interesting thank you so much give us information keep posting good luck
ReplyDeletehttp://www.poojaroy.com/aerocity-escorts.html
http://www.poojaroy.com/gurgaon-escorts.html
http://www.poojaroy.com/dwarka-escorts.html
http://www.poojaroy.com/delhi-call-girls.html
http://www.poojaroy.com/noida-escorts.html
http://www.poojaroy.com/vasant-vihar-escort-leena.html
http://www.poojaroy.com/saket-escorts.html
http://www.poojaroy.com/greater-kailash-escorts.html
http://www.poojaroy.com/lajpat-nagar-escorts.html