Monday, June 30, 2014

Introduction to using Random Forests for the Kaggle Titanic Data Set

During the summer a number of the members of the Connecticut R User Group decided to work on a Kaggle competition data set to improve our R programming skills. The first data set we tried was the Titanic data set. This is a fairly simple data set from which we are trying to predict who will survive and who will not. The task for the team was to simply load the data set and the Random Forest package and run the basic model. We completed that task last week. The R code to do that is below:

#load in Package randomForest
install.packages("randomForest")
library(randomForest)
## Read in Titanic Data training and test set
train.data<-read.csv("~/titanic/train(2).csv")
test.data<-read.csv("~/titanic/test.csv")
## Convert data into a simpler dataframe
train<<- data.frame(survived=train.data$Survived,
                    age=train.data$Age,
                    fare=train.data$Fare,
                    pclass=train.data$Pclass,
                    sex=as.integer(factor(train.data$Sex))
                    )
test<<- data.frame( age=test.data$Age,
                    fare=test.data$Fare,
                    pclass=test.data$Pclass,
                    sex=as.integer(factor(test.data$Sex))
                    )
## now we need to get rid of the NAs and make
train$fare[ is.na( train$fare) ]   <- 0
train$age[ is.na( train$age) ]     <- 30
test$fare[ is.na( test$fare) ]   <- 0
test$age[ is.na( test$age) ]     <- 30
labels <- as.factor(train[,1])
train <- train[,-1]
# fit a random forest and make a prediction
rf <- randomForest(train, labels, xtest=test, ntree=100,do.trace=TRUE)
results<<-predictions <- levels(labels)[rf$test$predicted]


As we do more work on the model and try other approached I will update on my blog.

Tuesday, June 10, 2014

Taco Bell's Waffle Taco a novelty that needs to go the way of the pet rock.


This morning I decided to try the Waffle Taco at Taco Bell. What a mistake! It was terrible. A tasteless frozen waffle with rubbery pseudo food inside. Do not ever even bother trying this because you will just regret it like I do. No wonder they use old people in their commericals for this item.Their taste buds are already shot.



Sunday, June 8, 2014

Reading a large number of files into R

I know this is a fairly basic topic, but it is one that caused me problems lately. Normally I only have to read in one data file at a time or I read in a few tables separately.

If I am reading in a single file would do the following

>read.table("file")

 or if it is online

>read.table("url")

If it is a csv file

read.csv("file")

Now the problem arose because I needed to read in 400 files from a directory, but the files were not numerically indexed. So to solve this problem I used the functions list.files and paste.

>names<-list.files("~/directory/")
>complete_names<-paste( "~directory", names, sep="")
>for (i in sep_along(names){
         monitors<-rbind(monitors, read.csv(complete_names[i]))

It was slow, but got the job done.





Thursday, July 25, 2013

Everything old is new again in Video Games

They say in fashion every comes back in style about every 20 years. I am beginning to think the same applies to video games. The online games of facebook and mobile games have the same look and feel of early PC games. The only thing I am glad is that command line games like Zork or Galtrader did not make a comeback because I was glad to see them go. Now I am hearing more and more about table video games returning to bars. It is like the 80s have come back in fashion.

In the 80s video games had there heyday in bars across America. Games like Pong, Pac-Man and Donkey Kong gain fame and fortune not on the home gaming screens, but on table video game units in bars. I remember walking through these smoke filled ( yes they still smoked in bars then) by table after table of video games being played with beers and quarters stacked on top of them. By the 1990s they had all disappeared to make way for things like Karaoke.


Now it seems that video games and beer will once again be joined together in a happy partnership of something that requires fine motor skill and something that impairs motor skills. Recently I saw a video of a new video games for Grasshopper NYC. The Game and its developer make no bones about its purpose. The games is called Don't F**K Up: A video game for drunk people. Here is a link to the Video.

Wednesday, July 3, 2013

ROIA - A great new Restaurant in New Haven

I met Avi, the owner of ROIA, about a year ago as he was working on getting his restaurant open.  He is a very passionate man and great to talk to. I was invited to the opening. The place and the food were outstanding! I remember thinking someone needs to write about this place to get the word out. I have eaten there a number of times since, and every time I keep thinking someone needs to write an article about this place because it is awesome. Well, the New Haven Register finally did. Go to ROIA you will love with the place and the food.

Register review of ROIA

Monday, May 20, 2013

Hate Map

Time ran an article on a hate map of tweets in the US. The article is interesting, and I think that heat maps are a good way to visualize data especially if there is an easy way to drill down in the representation to small areas and to the actual data itself. I am always concerned about people mining tweets for key words because the language used on Twitter is different from normal language, and it is rare that these tools mining these Tweets have any contextual awareness. I know that picking up things like sarcasm is hard if not impossible to do, but a tweet like "how not to raise a racist child (with a link) might be counted as a hate tweet even though it clearly was not the intent of the tweeter. Anyway I have attached the link.Time Article on Hate Map

Tuesday, January 8, 2013

Connecticut R User Group starts 2013

I can not believe the Connecticut R User Group is already in its second year. I started the group mostly because I did not want to have to go to all the way into New York or Boston just to connect with other R users. The group has been more successful than I had hope in its first year. We have had a host of great talks, the group has grown to over 70 members. If you are in Connecticut especially the New Haven area I hope to see you at our next meeting.