I always knew that Professor Andrew Gelman of Columbia was a well known Statistician and Social Scientist, but when he writes an article in the Baseball Prospectus now he is famous. It is always good to see a statistician write about a sabermetrician. Although these two fields are really the same it seems they try to separate themselves from each other.
It was interesting to me that Gelmen wrote about James as a baseball outsider not too different from what James was to baseball in 1984 when he wrote the "Inside-out Perspective" article. Baseball may always need the outsider prospective to push it along because its traditions and beliefs are so deep.
I thought one interesting issue that Gelmen touched upon was how little of the real work in sabermetrics gets published. When Gelmen works on a topic he publishes a paper that discusses his approach, provides an example and the code to run the example yourself. Not so in Sabermetrics. I find little detail in the published articles and very little code. This results in people like James moving away from positions and theories without explanation. I think this hurts the development of Sabermetrics in some ways. My view on how science is developed is the path of how gravity was discovered through a series of theories that we accepted and then rejected. First there was nature abores a vacuum, then there was nature abores a vacuum up to 32 feet and then finally there was gravity at 32ft/sec.
It was a fun article to read in preparation of my attendance at the Sabermetrics Seminar at Harvard May 21-22
I blog about world of Data Science with Visualization, Big Data, Analytics, Sabermetrics, Predictive HealthCare, Quant Finance, and Marketing Analytics using the R language.
Showing posts with label bill james. Show all posts
Showing posts with label bill james. Show all posts
Saturday, May 7, 2011
Tuesday, April 12, 2011
Analytics, Sabermetrics, Data Mining...Why can't we all just get along?
Sabermetrics was a term coined by Bill James to describe the analysis of baseball through objective evidence. Saber, or more accurately SABR, stands for the Society for American Baseball Research. With Bill James as its advocate. Sabermetrics has changed the way baseball is played. No easy task in a sport so encumbered by tradition. Baseball probably collects more data during a game than any other sport and each team plays at least 162 games a year. Rich data territory compared to the 16 regular season games played in the NFL. Sabermetrics has taken a hard look at the core beliefs of what statistics make a good baseball player or team and runs them against the cold judgement of analytics. The results showed that some previously treasured statistics like batting average were not as important statistics as once thought, but others like on base percentage were better indicators. This is predictive analytics at it best. So it is time to call Sabermatrics what it is analytics.
It is funny for all the impact Sabermetrics has had on baseball I believe it is still limited by the traditions of baseball. Let me give you some examples.
The Blog Sabermetic Research talks about Buck Showalter changing the way his base runners play to gain 5 runs per year which he claims is worth $10 million dollars. Makes sense if the data he is using is good, but the key here is the decision is claimed to be made solely on the numbers.
Pitching is another story. In baseball a starting pitcher must pitch five full innings in order to earn a decision (win/loss). Many talk about the difference between ERAs of starting versus relief pitchers. The data clearly shows that relief pitchers, even when they are the same person, have an overall ERA .50 lower than starting pitcher or better. Tango on Baseball touches on the subject in this article. My question is that if relief pitchers have a better ERA than stating pitchers, and starters are generally accepted to be better pitchers than relievers why aren't starters being used like relievers? The impact would be huge! A quick pass says this .50 ERA reduction in starting pitchers would result in 40 less runs allowed by a team over the course of a season! Using Showalter math that is $80 million dollars. I believe the reason that this is not looked at as a solution is because of tradition. If starting pitchers where used like relievers they would never pitcher 5 innings, and therefore would never get a decision. This would be a fundamental change in the way baseball is played.
In defense of Sabermetricians, there has been some discussion that ERA, like BA, is not a very useful statistic. This would mean that conclusions drawn from those statistics may not be as useful as they appear. I have not seen anything on starters versus relievers in terms of CERA, dERA, DICE or DIPS.
It is funny for all the impact Sabermetrics has had on baseball I believe it is still limited by the traditions of baseball. Let me give you some examples.
The Blog Sabermetic Research talks about Buck Showalter changing the way his base runners play to gain 5 runs per year which he claims is worth $10 million dollars. Makes sense if the data he is using is good, but the key here is the decision is claimed to be made solely on the numbers.
Pitching is another story. In baseball a starting pitcher must pitch five full innings in order to earn a decision (win/loss). Many talk about the difference between ERAs of starting versus relief pitchers. The data clearly shows that relief pitchers, even when they are the same person, have an overall ERA .50 lower than starting pitcher or better. Tango on Baseball touches on the subject in this article. My question is that if relief pitchers have a better ERA than stating pitchers, and starters are generally accepted to be better pitchers than relievers why aren't starters being used like relievers? The impact would be huge! A quick pass says this .50 ERA reduction in starting pitchers would result in 40 less runs allowed by a team over the course of a season! Using Showalter math that is $80 million dollars. I believe the reason that this is not looked at as a solution is because of tradition. If starting pitchers where used like relievers they would never pitcher 5 innings, and therefore would never get a decision. This would be a fundamental change in the way baseball is played.
In defense of Sabermetricians, there has been some discussion that ERA, like BA, is not a very useful statistic. This would mean that conclusions drawn from those statistics may not be as useful as they appear. I have not seen anything on starters versus relievers in terms of CERA, dERA, DICE or DIPS.
Monday, April 4, 2011
Why are the Red Sox better today? Sabremetrics or Construction?
I saw an article this morning from an MIT professor that predicted the Red Sox would win 100 games this year. That is a pretty bold statement since the Red Sox have only won 100 games in a season three times (1912, 1915 and 1946). However, it got me to wondering how have the Red Sox become so good in recent history. I often heard comments the claim that it is the payroll or the genius of Theo Epstein. Whenever I am with statistics guys, it is the hiring of Bill James and the use of Sabremetrics that made the difference. I have a third theory to put forth as the major reason for the improvement of the Red Sox in recent history, construction at Fenway. Oddly, this started the same year that Bill James was hired by the Red Sox, 2003.
From 1995 to 2002 the Red Sox had a combined record of 695-582 winning 54.42% of their games. From 2003 to 2010 the Red Sox had a combined record of 749-547 winning 57.79% of their games.
So they are a got better after 2003 and Theo is a genius and Sabremetrics rules baseball. I am not so sure, and I think we reach those numbers based on a Simpson's paradox. Let me explain. If Sabremetrics had been the driving reason for the improvement the Red Sox. they would have gotten better not only at home but away as well. They did not. In fact the Red Sox improved massively at home, but got worse on the road. So what is the factor that explains this? In 2003, the same year Bill James was hired by the Red Sox, additional seating was added the Fenway park for the first time since it was 1946. While it was was always known that Fenway was helpful to certain types of hitters and pitchers and the Red Sox teams have always emphasized those players. I believe that construction made the park even more baised than it was before.
During the period 1995 to 2002 the Red Sox had a better away record than they did from 2003-2010.
MIT economist says Red Sox will win 100 games in 2011
From 1995 to 2002 the Red Sox had a combined record of 695-582 winning 54.42% of their games. From 2003 to 2010 the Red Sox had a combined record of 749-547 winning 57.79% of their games.
| Year | W | L | Winning % | Year | W | L | Winning % | ||
| 2010 | 89 | 73 | 54.94% | 2002 | 93 | 69 | 57.41% | ||
| 2009 | 95 | 67 | 58.64% | 2001 | 82 | 79 | 50.93% | ||
| 2008 | 95 | 67 | 58.64% | 2000 | 85 | 77 | 52.47% | ||
| 2007 | 96 | 66 | 59.26% | 1999 | 94 | 68 | 58.02% | ||
| 2006 | 86 | 76 | 53.09% | 1998 | 92 | 70 | 56.79% | ||
| 2005 | 95 | 67 | 58.64% | 1997 | 78 | 84 | 48.15% | ||
| 2004 | 98 | 64 | 60.49% | 1996 | 85 | 77 | 52.47% | ||
| 2003 | 95 | 67 | 58.64% | 1995 | 86 | 58 | 59.72% | ||
| 749 | 547 | 57.79% | 695 | 582 | 54.42% | ||||
So they are a got better after 2003 and Theo is a genius and Sabremetrics rules baseball. I am not so sure, and I think we reach those numbers based on a Simpson's paradox. Let me explain. If Sabremetrics had been the driving reason for the improvement the Red Sox. they would have gotten better not only at home but away as well. They did not. In fact the Red Sox improved massively at home, but got worse on the road. So what is the factor that explains this? In 2003, the same year Bill James was hired by the Red Sox, additional seating was added the Fenway park for the first time since it was 1946. While it was was always known that Fenway was helpful to certain types of hitters and pitchers and the Red Sox teams have always emphasized those players. I believe that construction made the park even more baised than it was before.
During the period 1995 to 2002 the Red Sox had a better away record than they did from 2003-2010.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For Home games it is a very Different story:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MIT economist says Red Sox will win 100 games in 2011
Labels:
analytics,
baseball,
bill james,
mit,
red sox,
sabermetrics,
sabr,
statistics
Subscribe to:
Posts (Atom)