From their Rules page:
We're quite curious, really. To the tune of one million dollars.Sounds like fun. I wonder how much time I'll end up wasting on this one.
We've developed our world-class movie recommendation system: Cinematch. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. We use those predictions to make personal movie recommendations based on each customer's unique tastes. And while Cinematch is doing pretty well, it can always be made better.
Now there are a lot of interesting alternative approaches to how Cinematch works that we haven't tried. Some are described in the literature, some aren't. We're curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business.
So, we thought we'd make a contest out of finding the answer. It's "easy" really. We provide you with a lot of anonymous rating data, and a prediction accuracy bar that is 10% better than what Cinematch can do on the same training data set. (Accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.)
If you develop a system that we judge most beats that bar on the qualifying test set we provide, you get serious money and the bragging rights. But (and you knew there would be a catch, right?) only if you share your method with us and describe to the world how you did it and why it works.
Serious money demands a serious bar. We suspect the 10% improvement is pretty tough, but we also think there is a good chance it can be achieved. It may take months; it might take years.
There is no cost to enter, no purchase required, and you need not be a Netflix subscriber. So if you know (or want to learn) something about machine learning and recommendation systems, give it a shot. We could make it really worth your while.
If you are thinking of entering the contest, you might be interested to know that much of the Internet Movie Database (IMDb) database is available for download. Another good source for movie content is Amazon Web Services.
[Contest found via Pete Abilla and John Krystynak]
Update: I should explicitly point out that this Netflix data is by far the largest ratings data set available to the research community. Most work on recommender systems outside of companies like Amazon or Netflix has had to make do with the relatively small 1M rating MovieLens data or the 3M EachMovie data set. This Netflix data set is 100M ratings. It will be enormously useful for recommender system research.
Update: The comments on this post are starting to get pretty interesting.
Update: On the idea of using external movie data, Ilya Grigorik published data linking the Netflix movie ids to features extracted from IMDb data.