Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: It is a Sabermetrics Weekend so todays post is Sabermetrics

Friday, May 20, 2011

It is a Sabermetrics Weekend so todays post is Sabermetrics

This weekend I am going to the Sabermetrics Seminar in Boston. Some might think that it is strange that I am excited about this given that I never played baseball, and I do not watch many games. However, the analytics being done is baseball is developing and expanding at such a rapid pace there is no way you can enjoy analytics and not be interested. Recently a friend of mine ran into Prof. Bertsimas and asked if he could have a copy of the now famous paper that he wrote predicting the Red Sox would win 100 games this year. Prof Bertsimas asked "are you a fan of baseball?" to which he responded "No, I am a fan of statistics". 

The development of Sabermetrics in the last 30 years has been to look at existing data and try to build predictive models out of that data. It was a good first step and produced some good results. This work revealed that some of the historical statistics, like ERA, were not good predictors of anything so Sabermetricians created statistics that were better predictors. This is all great, and it has taken Sabermetrics to where it is today.

The problem with the data that has been used today in baseball is that it is all result based data. The pitcher threw a strike or a ball, the batter got on base, etc. That is all changing. Welcome to the world of physical data in Baseball. This post on Beyond the Boxscore is a good example. It has taken the improvement of a players performance back to the physical location of his pitch not just that more of his pitches resulted in ground balls, but an attempt to answer why based on data not opinion. The technology exists not only to track data of a baseball as it crosses the plate but within the entire ballpark. First this is going to create an unbelievable amount of data that needs to be in studied in ways not currently used in baseball because of shear volume. Second this data is collected in real time which means the models could be updated in real time. Billy Bean may have had his 3X5 note card in front him, but the manager of the future may be holding his iPad with feedback on up to the last pitch and the suggested options with predicted results of those options.

One of the companies doing this physical data collection in baseball is Trackman. They also recently posted for an R developer. I can not wait to see what is coming!