Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Sabermeterics to predict pitching injury

Wednesday, June 22, 2011

Sabermeterics to predict pitching injury

I just finished The Extra 2% by Jonah Keri on the raise of Sabermetrics at the Tampa Bay Rays, and the resulting World Series that it won them. It is a quick read and a great introduction to the business of Baseball and how MLB teams have incorporated Sabermetrics into their management on recent years.

I met Keri up at the Sabermetric seminar at Harvard which was a fundraiser for the Dana Faber Cancer Institute. I believe they are going to do it again next year, and I strongly encourage anyone interested in Sabermetrics to go. It was my first time talking to people about sports statistics instead of running models and playing with data. Again if it happens again next year go!

In the last year at Bigcomputing we have done a great deal of work on predicting people's health for hospitals and healthcare companies with great success. The are a number of predictive analytics competitions that are trying to develop predictive models for things like predicting if a patient will be hospitalized within the week, month or year. The most well known of these competitions is the Heritage Health Care Prize for $3 Million dollars that is being hosted by Obviously with prizes of that size there is real potential to predict things like injury and disease.

Tom Tippett, head analyst of the Boston Red Sox, talked at the seminar about what things the Red Sox look at when they evaluate a player for contract. After this talk, I asked him if there was a predictive component for injury in their sabermetric models for predicting a players future performance. He said their was not. That surprised me, because I thought it could be done with the vast amount of data that is collected on the various players.

In Jonah Keri's book the Rays hired a guy who was able to predict injury of pitchers within a short time frame based on their Pitch F/X data. Josh Kalk published an article in Hardball Times called "The Injury Zone". He was later hired by the Rays where he has continued his work on Pitch F/X data among a mountain of other things. Injury prediction based on the physical and results data openly available in baseball is possible. Maybe even more so with the confidential information the teams have access to like Trackman data and medical and scouting reports. The key is being able to incorporate this into a predictive model with the idea of predicting injury. This type of risk analysis has a lot of value when you are talking about players who make an average of around $4M a year. I also love that Kalk's boss at the Rays was James Click. Click and Kalk should always work together.


  1. If you're able to mesh your previous injury prediction modeling with the baseball side, let me know. Fascinating topic, and obviously I'd love to learn more. Thanks!

  2. I have done some work on it, but the problem I run into is the confidential nature of the data. I think if we end up doing more work on it will be with a team because it is the only way I can get the data I think I need and the right to use it. The Kaggle competition is still going to happen. I really enjoyed you book. Good stuff!