Sabermetrics was a term coined by Bill James to describe the analysis of baseball through objective evidence. Saber, or more accurately SABR, stands for the Society for American Baseball Research. With Bill James as its advocate. Sabermetrics has changed the way baseball is played. No easy task in a sport so encumbered by tradition. Baseball probably collects more data during a game than any other sport and each team plays at least 162 games a year. Rich data territory compared to the 16 regular season games played in the NFL. Sabermetrics has taken a hard look at the core beliefs of what statistics make a good baseball player or team and runs them against the cold judgement of analytics. The results showed that some previously treasured statistics like batting average were not as important statistics as once thought, but others like on base percentage were better indicators. This is predictive analytics at it best. So it is time to call Sabermatrics what it is analytics.
It is funny for all the impact Sabermetrics has had on baseball I believe it is still limited by the traditions of baseball. Let me give you some examples.
The Blog Sabermetic Research talks about Buck Showalter changing the way his base runners play to gain 5 runs per year which he claims is worth $10 million dollars. Makes sense if the data he is using is good, but the key here is the decision is claimed to be made solely on the numbers.
Pitching is another story. In baseball a starting pitcher must pitch five full innings in order to earn a decision (win/loss). Many talk about the difference between ERAs of starting versus relief pitchers. The data clearly shows that relief pitchers, even when they are the same person, have an overall ERA .50 lower than starting pitcher or better. Tango on Baseball touches on the subject in this article. My question is that if relief pitchers have a better ERA than stating pitchers, and starters are generally accepted to be better pitchers than relievers why aren't starters being used like relievers? The impact would be huge! A quick pass says this .50 ERA reduction in starting pitchers would result in 40 less runs allowed by a team over the course of a season! Using Showalter math that is $80 million dollars. I believe the reason that this is not looked at as a solution is because of tradition. If starting pitchers where used like relievers they would never pitcher 5 innings, and therefore would never get a decision. This would be a fundamental change in the way baseball is played.
In defense of Sabermetricians, there has been some discussion that ERA, like BA, is not a very useful statistic. This would mean that conclusions drawn from those statistics may not be as useful as they appear. I have not seen anything on starters versus relievers in terms of CERA, dERA, DICE or DIPS.
I blog about world of Data Science with Visualization, Big Data, Analytics, Sabermetrics, Predictive HealthCare, Quant Finance, and Marketing Analytics using the R language.
Showing posts with label datamining. Show all posts
Showing posts with label datamining. Show all posts
Tuesday, April 12, 2011
Friday, March 18, 2011
How does beer relate to Scrapping Data?
Sometimes I find myself involved in projects as a result of having just one too many rounds of beer after a meetup. Such is the case with the beer predictor. In an overt attempt to ingratiate ourselves with the local craft brewers we thought it would be a good idea to write some analytics on craft beers to get noticed by those brewers so that they would in turn supply us with free beer. At least we had a reasonable goal in mind. I will update that project as we go along. The first part of the project was to scrap the data from the Beer Advocate. I had not started doing this because it was St Paddy's day, and I felt the proper way to study beer on that day is consumption. Others wrote code, and finished that portion of the project. I did come across this blog post on scrapping data in R with XML versus in Python with Beautiful Soup which I thought was interesting.
http://thelogcabin.wordpress.com/2010/08/31/using-xml-package-vs-beautifulsoup/
http://thelogcabin.wordpress.com/2010/08/31/using-xml-package-vs-beautifulsoup/
Labels:
analyticds,
beautifulsoup,
beer,
datamining,
predictive,
Python,
r,
scrapping,
XML
Subscribe to:
Posts (Atom)