For someone just starting out in either R or Sabermetrics Pitch F/X is a great way to get into the hobby. This post has some simple code that produces interesting images of relevant data. My first introduction to R and Sabermetrics was the Baseball Hacks book and it spurred my interest in both. I also think some of the more interesting data publicly available is the Pitch F/X data because it is performance based data as opposed to results based data typical in baseball. Although I wish they would make the Hit F/X data and the Trackman data available.
I am working on putting together a Kaggle competition with a sabermetrics theme with Anthony Goldbloom and others. If there is anyone out there who has a suggestion on what would be good to be able to predict I would love to hear from you. I looked at using the retrosheet or MLBAM data for the dataset or anything that includes the Pitch F/X data. I would also appreciate any other recommendations on where to get the data for the contest.
Here is the article from R-bloggers by Millsy with Josh Weinstock
Sounds fantastic. This may get me to put some time into a Kaggle competition, as I've only been able to do these in my free time for extra R practice.
ReplyDeleteI really think using FX data would be a good road for this. For the most fun, it may be interesting to predict whether a single pitch is made contact with or not, given the game state, type of pitch, count, the opposing batter abilities, pitcher ability, velocity, location, etc. Depending on the data needs, I may be able to help out with my current database.
I posted a link at my website, as I'm sure there will be a lot of people that come by there who will want to take part. This sounds like a lot of fun. Not that I need more things to take up my time...I'm already trying my hand at the StatDNA soccer analytics competition...
It seems I'm on the right track, I hope I can do well. The result was something I did and was doing to implement it.
ReplyDeletehttp://www.clickjogosclick.com