Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Cleveland Indians are better than Sabermetricians Predicted

Wednesday, May 25, 2011

Cleveland Indians are better than Sabermetricians Predicted

When I was at the Sabermetric Seminar in Boston. The Indians success in the first quarter of the year was a topic of discussion. The explanation given by Tom Tippett was that the Indians where over performing against the model and would over the course of the season return to their expectation. In support of that an expected run chart was put up showing the Indians with the greatest positive actual run differential versus expected run differential. The Red Sox were underperforming in respect to this measure.

While I understand there will always be statistical anomolies and periodic straying from the mean, I am not so sure that this is the case here. Modelers have a tendency to explain away differences from reality compared to there models as variation. While that may and will be the case sometimes for a three standard deviation outlier we are talking about a 3 in 1,000 chance. Rather than take that bet I would check to see if my model failed to take something into account. In the case of the Indians improvement, I would be more likely to look for shortcomings in my model because the Indians are a Sabermetric driven team and the guy who runs their analytics is a very talented guy. Teams do not share their models so there is no way of know if the various model are similar or even what input Data they use. A general impression from the Sabermetric conference is that Sabermaatricians do a lot of regression to the league mean which will smooth out the data, but may also underemphasize relevant data.

I believe even a quick look at even high level data for the Indians suggests their performance is not a wandering away from the mean but a shift in the mean.  Most of the difference in 2011 can be  attributed to the 233 runs scored in 46 games or 5 runs per game compared to 4 run per game in 2010. This can be explained because most sabermetric models fail to incorporate injuries into their models which was a factor in 2010 for the Indians and would negatively effect their run prediction in 2011. A lack of injury prediction and weighting due to past injuries in Sabermetric models is a major disconnect in Sabermetrics and needs to be addressed. The Healthcare industry has made great strides in this area in recent history with the use on ensemble methods.