Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Analytics, Sabermetrics, Data Mining...Why can't we all just get along?

Tuesday, April 12, 2011

Analytics, Sabermetrics, Data Mining...Why can't we all just get along?

Sabermetrics was a term coined by Bill James to describe the analysis of baseball through objective evidence. Saber, or more accurately SABR, stands for the Society for American Baseball Research. With Bill James as its advocate. Sabermetrics has changed the way baseball is played. No easy task in a sport so encumbered by tradition. Baseball probably collects more data during a game than any other sport and each team plays at least 162 games a year. Rich data territory compared to the 16 regular season games played in the NFL.  Sabermetrics has taken a hard look at the core beliefs of what statistics make a good baseball player or team and runs them against the cold judgement of analytics. The results showed that some previously treasured statistics like batting average were not as important statistics as once thought, but others like on base percentage were better indicators. This is predictive analytics at it best. So it is time to call Sabermatrics what it is analytics.

It is funny for all the impact Sabermetrics has had on baseball I believe it is still limited by the traditions of baseball. Let me give you some examples.

The Blog Sabermetic Research talks about Buck Showalter changing the way his base runners play to gain 5 runs per year which he claims is worth $10 million dollars. Makes sense if the data he is using is good, but the key here is the decision is claimed to be made solely on the numbers.

Pitching is another story. In baseball a starting pitcher must pitch five full innings in order to earn a decision (win/loss).  Many talk about the difference between ERAs of starting versus relief pitchers. The data clearly shows that relief pitchers, even when they are the same person,  have an overall ERA .50 lower than starting pitcher or better. Tango on Baseball touches on the subject in this article. My question is that if relief pitchers have a better ERA than stating pitchers, and starters are generally accepted to be better pitchers than relievers why aren't starters being used like relievers? The impact would be huge! A quick pass says this .50 ERA reduction in starting pitchers would result in 40 less runs allowed by a team over the course of a season! Using Showalter math that is $80 million dollars. I believe the reason that this is not looked at as a solution is because of tradition. If starting pitchers where used like relievers they would never pitcher 5 innings, and therefore would never get  a decision. This would be a fundamental change in the way baseball is played.

In defense of Sabermetricians, there has been some discussion that ERA, like BA, is not a very useful statistic. This would mean that conclusions drawn from those statistics may not be as useful as they appear. I have not seen anything on starters versus relievers in terms of CERA, dERA, DICE or DIPS.