The trend continued until 2013. Everything was about more complex models and putting them together using ensemble methods to increase the predictive power of the model. Think about it. The Netflix prize was won by a group that basically put together 107 models in one massive hairball. Its predictive power was better than anything else, but it was completely opaque as how it worked, and it took forever to run. Netflix claims they never implemented the winning model basically because the engineering effort to implement the algorithm did not justify the performance improvement from a simpler algorithm.

Here is the write up from their blog:

In 2006 we announced the Netflix Prize, a machine learning and data mining competition for movie rating prediction. We offered $1 million to whoever improved the accuracy of our existing system called

*Cinematch*by 10%. We conducted this competition to find new ways to improve the recommendations we provide to our members, which is a key part of our business. However, we had to come up with a proxy question that was easier to evaluate and quantify: the*root mean squared error*(RMSE) of the predicted rating. The race was on to beat our RMSE of 0.9525 with the finish line of reducing it to 0.8572 or less.
A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. And, they gave us the source code. We looked at the two underlying algorithms with the best performance in the ensemble:

*Matrix Factorization*(which the community generally called SVD,*Singular Value Decomposition*) and*Restricted Boltzmann Machines*(RBM). SVD by itself provided a 0.8914 RMSE, while RBM alone provided a competitive but slightly worse 0.8990 RMSE. A linear blend of these two reduced the error to 0.88. To put these algorithms to use, we had to work to overcome some limitations, for instance that they were built to handle 100 million ratings, instead of the more than 5 billion that we have, and that they were not built to adapt as members added more ratings. But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine.
If you followed the Prize competition, you might be wondering what happened with the final Grand Prize ensemble that won the $1M two years later. This is a truly impressive compilation and culmination of years of work, blending hundreds of predictive models to finally cross the finish line. We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment. Also, our focus on improving Netflix personalization had shifted to the next level by then. In the remainder of this post we will explain how and why it has shifted.

That type of response signal the beginning of a change away from the "best" model to a simpler models that were easier to implement and easier to understand. That process is continuing today. The added side benefit to all this is that now the modeling can be done by a much larger group of people. This change has helped address the growth of in the size of the data and the lack of data scientist available to do the work.

## No comments:

## Post a Comment