Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: April 2012

Tuesday, April 24, 2012

Red Sox off to worst April in over a decade

2012 continues to be a year of struggles for the Boston Red Sox as they show more wear and tear than the 100 year old Fenway that they play in. This a quite a change for the first decade of the new century.

The 2000s have been years when the mighty Red Sox roared out of Spring training with strong starters, a powerful closer and might batters. Their April record inspired talk of winning that first longed for world series or another one this year. It seems so long ago.

Now the Red Sox lumber into the new season with an unsure rotation, an opening at closer and bats that still have not emerged from a winter in storage. It is hard to get depressed in spring, but a 5-10 Red Sox will do it for me.

I am believe we are seeing the Red Sox move away from the things that made them successful in recent years. Management like Theo Epstein and Terry Francona have moved on. Players have aged and retired like Wakefield, Manny and Varitek. The Red sox need to return to what made them successful. A strong front office driven by Sabermetrics. I hope it happens soon.

Tuesday, April 17, 2012

Next Connecticut R Users Group Meetup is May 15th. #rstats

We have confirmed the room at the Yale Statistics Department on Hillhouse Avenue for May 15th for our next Rusers Meetup. The meeting with start with two short lighting talks and be followed by an R-Lab similar to the very successful Stats-Lab that Dr. Emerson has done at Yale for a number of years now.

The first presentation will be by Rajarshi Guha with a lighting talk exploring twitter data with R+basic sentiment analysis. Twitter feeds are a great way to try out different types of analysis and packages like Jeff Gentry's Twitter-R package make it relatively easy to do.

The second presentation will be by Illya Mowerman with an example of using logistic regression in R to predict leads and converters in Marketing. Regression really forms the foundation of a lot of predictive analytics so this should be interesting.

If you have a data set you want the group to play with or some code that you could use help with bring it to the meeting and we will work through it as a group in the R-Lab portion of the meetup.

Friday, April 13, 2012

Johnny Damon signs with the Cleveland Indians

This morning it was announced that Johnny Damon signed a one year contract with the Cleveland Indians. I have always been a Johnny Damon fan, and I appreciate the skill with which the Indians have been run as a Sabremetrics team with the help of Keith Woolner for many years. However, I can not see how this makes sense from any perspective except Damon has found a team with such a large void that he will get a chance to play.

I am also surprised that Damon is coming to the Indians as a left fielder. He has not regularly played in the field since 2009, and it was my understanding that other teams were really looking at him as a designated hitter. Damon can still hit with a .261 with a .743 OPS last year with the Rays. There no doubt he can still run with 19 successful steals out of 25 attempts last year. However, he is 38 years old and playing the outfield is not a short sprint to the next base but constant and continual movement over the course of an inning.

Maybe the Indians are just trying to fill the hole they have in left field any way they can or they see something in Damon that no one else does. The Indians are really good are seeing things other people miss. The Johnny Damon fan in me hopes it is more than that. I wish Demon Damon a great year.

Thursday, April 12, 2012

A great start to the Connecticut R Users Group

Tuesday night was our first meetup, and it went off exceptionally well. Jay is a great leader for a discussion, and the ten plus people who came to our first meeting really got a treat.

Rather than do a strict presentation Jay just threw up live onto the screen two projects that he was working on. There is no better way to show the power of R of than in exploratory data analysis. In minutes Jay was able to read in a data set from the web, clean up that data and play with it. I do not believe any other language can do this type of work with the speed and ease of R.

This format went so well that we will continue to use this format for the Connecticut R Users Meetup with a little modification. The basic format will be one or two lighting talks by members about what they are working on in R followed by a bring in your problem/code session.  The second part is similar to what Jay has been doing with his Statistics Lab at Yale for years with great success. The idea is someone brings in some code and/or a data set that they are working on and having trouble with. The group than works on that problem collectively to develop approaches and implementations to exact information from that problem.

I am looking forward to our next meeting in May.

Monday, April 9, 2012

Red Sox headed for a worse start than 1945

The Bobby Valentine era at Fenway park has started with a fizzle.  The Red Sox have entered the 2012 season the same way they began the 2011 campaign, on a losing streak. In 2011 the Red Sox started 0-6  before beating the New York Yankees and ending their quest to have the worst start in Red Sox history.

This could be the year that the Red Sox finally do away with a record that has stood for over sixty years. In 1945 the Red Sox started the season with eight straight losses. That streak ended when they beat the Philadelphia Athletics on April 28, 1945. The 2011 losing streak ended with a victory over the Yankees. There is a chance the Red Sox could end this current steak by beating the Yankees on April 20th at Fenway. The Yankees are currently off to a winless start as well.

For the last five years the Red Sox have had a poor first two weeks of the season starting every year with a losing record. It is an odd development that could just be the result of chance.

Given the worst start ever by a Major League Baseball team is 21 games by the 1988 Baltimore Orioles the Red Sox all time worst start of 8 games is fairly unimpressive.

Wednesday, April 4, 2012

Update on 1940 Census release.....Servers overwhelmed

I guess I was not the only one waiting for the release of the 1940 US Census Documents in digital format on April 2. The traffic was so high that it overwhelmed the servers. Here is a link to the USAToday article on the server issue.

I have not done any serious work with the data. I have just had fun with it so far. I looked up my grandparents, and the report included both my mother and father which was cool. I did notice how slow the website was, but I thought it was on my end and not related to traffic on their end. I guess I was wrong.