Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Predictive Analytics for March Madness 2012

Monday, March 12, 2012

Predictive Analytics for March Madness 2012

For a couple of years now Danny Tarlow and Commisioner Lee have hosted a Predictive Analytics competition for March Madness. It even got some press last year:"

Software to predict 'March Madness' basketball winner

MacGregor Campbell, consultant
BasketBall.jpg (Image: Jonathan Daniel/Getty)
Fine, computers, you can beat us at chess and Jeopardy!, just please let us keep March Madness. With the US National Collegiate Athletic Association's basketball tournament starting today, contestants in the second annual March Madness Predictive Analytics Challenge are attempting to build software that can pick winning teams better than humans.
The contest pits machine against machine to find out which algorithm can correctly predict the outcome of the 64-team contest. Tournament brackets must be chosen entirely by computer algorithm, and no specific team-based rules, such as "always pick Duke over North Carolina", are allowed. All contestants are restricted to using the same data set - team and player statistics from the 2006 season until last month.
Contest organiser Danny Tarlow's own entry started out as a movie recommendation engine similar to those used on sites like Netflix. He says that predicting what movie a particular person would like to see is similar to predicting how well a basketball team's attack will do against their opponent's defence: both interactions are driven by unknown rules.
To predict the result of a basketball game, his algorithm chews through loads of regular season data and uses probabilities to find equations that fit the outcomes of each game. It then uses these equations to pick which teams will win in tournament match-ups. "The algorithm knows nothing about basketball or details about any team. It just sees the outcome of each game in the season, and it tries to discover latent characteristics that best explain the outcomes," he says.
Other entries range from using genetic algorithms to evolve equations that can pick winners to more straightforward attempts to boil down a team's strengths and weaknesses to a single number, then pick the team with the higher number in each match-up.
Last year's contest had 10 entries, including a "pace" bracket that simply picked the higher-seeded team in each matchup. Six of the entries did better than this baseline, one even predicting underdog Butler University's surprising ascent to the final four.
Tarlow hopes for a better performance this year, but is well aware of the difficulty of predicting the outcome of an entire basketball tournament. "There's clearly a lot of luck that goes into having a successful bracket."
We'll know how the software programs fare soon - the round of 64 begins today."

I often think that the world of predictive analytic competitions is made up solely of Kaggle competitions, but there are lots of others out there.These two guys have run a good contest for a while now so I encourage everyone to give it a try especially if you are an R user instead of a Python guy.

I played with some models to do this but none of them where ever outstanding.  One I liked had a factor for streaky teams. I found that teams who had long runs of multiple wins tended to do better than those teams with similar records who did not. When  I further tuned this with weighting for things like streaks later in the season and level of competition it seemed to do better than anything else I tried. If you have the time don't just fill out a bracket predict one.