Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: bill james
Showing posts with label bill james. Show all posts
Showing posts with label bill james. Show all posts

Saturday, May 7, 2011

Gelman writes about Bill James

I always knew that Professor Andrew Gelman of Columbia was a well known Statistician and Social Scientist, but when he writes an article in the Baseball Prospectus now he is famous. It is always good to see a statistician write about a sabermetrician. Although these two fields are really the same it seems they try to separate themselves from each other.

It was interesting to me that Gelmen wrote about James as a baseball outsider not too different from what James was to baseball in 1984 when he wrote the "Inside-out Perspective" article. Baseball may always need the outsider prospective to push it along because its traditions and beliefs are so deep.

I thought one interesting issue that Gelmen touched upon was how little of the real work in sabermetrics gets published. When Gelmen works on a topic he publishes a paper that discusses his approach, provides an example and the code to run the example yourself. Not so in Sabermetrics. I find little detail in the published articles and very little code. This results in people like James moving away from positions and theories without explanation. I think this hurts the development of Sabermetrics in some ways. My view on how science is developed is the path of how gravity was discovered through a series of theories that we accepted and then rejected. First there was nature abores a vacuum, then there was nature abores a vacuum up to 32 feet and then finally there was gravity at 32ft/sec.

It was a fun article to read in preparation of my attendance at the Sabermetrics Seminar at Harvard May 21-22

Tuesday, April 12, 2011

Analytics, Sabermetrics, Data Mining...Why can't we all just get along?

Sabermetrics was a term coined by Bill James to describe the analysis of baseball through objective evidence. Saber, or more accurately SABR, stands for the Society for American Baseball Research. With Bill James as its advocate. Sabermetrics has changed the way baseball is played. No easy task in a sport so encumbered by tradition. Baseball probably collects more data during a game than any other sport and each team plays at least 162 games a year. Rich data territory compared to the 16 regular season games played in the NFL.  Sabermetrics has taken a hard look at the core beliefs of what statistics make a good baseball player or team and runs them against the cold judgement of analytics. The results showed that some previously treasured statistics like batting average were not as important statistics as once thought, but others like on base percentage were better indicators. This is predictive analytics at it best. So it is time to call Sabermatrics what it is analytics.

It is funny for all the impact Sabermetrics has had on baseball I believe it is still limited by the traditions of baseball. Let me give you some examples.

The Blog Sabermetic Research talks about Buck Showalter changing the way his base runners play to gain 5 runs per year which he claims is worth $10 million dollars. Makes sense if the data he is using is good, but the key here is the decision is claimed to be made solely on the numbers.

Pitching is another story. In baseball a starting pitcher must pitch five full innings in order to earn a decision (win/loss).  Many talk about the difference between ERAs of starting versus relief pitchers. The data clearly shows that relief pitchers, even when they are the same person,  have an overall ERA .50 lower than starting pitcher or better. Tango on Baseball touches on the subject in this article. My question is that if relief pitchers have a better ERA than stating pitchers, and starters are generally accepted to be better pitchers than relievers why aren't starters being used like relievers? The impact would be huge! A quick pass says this .50 ERA reduction in starting pitchers would result in 40 less runs allowed by a team over the course of a season! Using Showalter math that is $80 million dollars. I believe the reason that this is not looked at as a solution is because of tradition. If starting pitchers where used like relievers they would never pitcher 5 innings, and therefore would never get  a decision. This would be a fundamental change in the way baseball is played.

In defense of Sabermetricians, there has been some discussion that ERA, like BA, is not a very useful statistic. This would mean that conclusions drawn from those statistics may not be as useful as they appear. I have not seen anything on starters versus relievers in terms of CERA, dERA, DICE or DIPS.

Monday, April 4, 2011

Why are the Red Sox better today? Sabremetrics or Construction?

I saw an article this morning from an MIT professor that predicted the Red Sox would win 100 games this year. That is a pretty bold statement since the Red Sox have only won 100 games in a season three times (1912, 1915 and 1946). However, it got me to wondering how have the Red Sox become so good in recent history. I often heard comments the claim that it is the payroll or the genius of Theo Epstein. Whenever I am with statistics guys, it is the hiring of Bill James and the use of Sabremetrics that made the difference. I have a third theory to put forth as the major reason for the improvement of the Red Sox in recent history, construction at Fenway. Oddly, this started the same year that Bill James was hired by the Red Sox, 2003.

From 1995 to 2002 the Red Sox had a combined record of 695-582 winning 54.42% of their games. From 2003 to 2010 the Red Sox had a combined record of 749-547 winning 57.79% of their games.


Year W L Winning % Year W L Winning %
2010 89 73 54.94% 2002 93 69 57.41%
2009 95 67 58.64% 2001 82 79 50.93%
2008 95 67 58.64% 2000 85 77 52.47%
2007 96 66 59.26% 1999 94 68 58.02%
2006 86 76 53.09% 1998 92 70 56.79%
2005 95 67 58.64% 1997 78 84 48.15%
2004 98 64 60.49% 1996 85 77 52.47%
2003 95 67 58.64% 1995 86 58 59.72%
749 547 57.79% 695 582 54.42%


So they are a got better after 2003 and Theo is a genius and Sabremetrics rules baseball. I am not so sure, and I think we reach those numbers based on a Simpson's paradox. Let me explain. If Sabremetrics had been the driving reason for the improvement the Red Sox. they would have gotten better not only at home but away as well. They did not. In fact the Red Sox improved massively at home, but got worse on the road. So what is the factor that explains this? In 2003, the same year Bill James was hired by the Red Sox, additional seating was added the Fenway park for the first time since it was 1946. While it was was always known that Fenway was helpful to certain types of hitters and pitchers and the Red Sox teams have always emphasized those players. I believe that construction made the park even more baised than it was before.



During the period 1995 to 2002 the Red Sox had a better away record than they did from 2003-2010.



Away Record
W L % W L %
2010 40 41 49.38% 2002 51 30 62.96%
2009 39 42 48.15% 2001 41 39 51.25%
2008 39 42 48.15% 2000 43 38 53.09%
2007 40 41 49.38% 1999 45 36 55.56%
2006 35 46 43.21% 1998 41 40 50.62%
2005 41 40 50.62% 1997 39 42 48.15%
2004 43 38 53.09% 1996 38 43 46.91%
2003 42 39 51.85% 1995 43 28 60.56%
total 319 329 49.23% 341 296 53.53%








For Home games it is a very Different story:



Home Record






W L %

W L %
2010 49 32 60.49%
2002 42 39 51.85%
2009 56 25 69.14%
2001 41 40 50.62%
2008 56 25 69.14%
2000 42 39 51.85%
2007 56 25 69.14%
1999 49 32 60.49%
2006 51 30 62.96%
1998 51 30 62.96%
2005 54 27 66.67%
1997 39 42 48.15%
2004 55 26 67.90%
1996 47 34 58.02%
2003 53 28 65.43%
1995 43 30 58.90%
total 430 218 66.36%

354 286 55.31%






























































































MIT economist says Red Sox will win 100 games in 2011