Eight months into the first year, there has been impressive progress on the Netflix prize.
The current leader has a 7.7% improvement. While each step in closing the remaining 2.3% gap to win the $1M prize will be harder and harder, it is remarkable how far the entrants have come.
I have to admit that I have spent a fair amount of time playing with the contest and the Netflix data. It is quite a bit of fun.
Other than some initial attempts at very simple things like predicting averages or modified averages to get a baseline, most of my attempts were either using variants of traditional collaborative filtering (finding similar users) or item-to-item collaborative filtering (finding similar items). I did get modest improvements, but nothing like the performance of the current leaders. I also played with some simple clustering; my results on that were quite poor.
As it turns out, Netflix currently uses a type of item-to-item collaborative filtering for their recommendations, as Netflix's VP of Recommendation Systems Jim Bennet described in a Sept 2006 talk (PDF).
I suspect substantial improvements, then, will require something quite a bit different than what Netflix is doing. This likely rules out item-to-item collaborative filtering. Experimenting with other techniques, I suspect, will be more likely to bear fruit.
I also suspect that additional data may be useful on this problem. While the creators of the Netflix contest apparently did not expect this much progress this fast, it may be the case that it simply is impossible to get the full 10% improvement using the ratings data alone. In my analyses, data simply seemed too sparse in some areas to make any predictions, and supplementing with another data set seemed like the most promising way to fill in the gaps.
It was very fun working on the contest. As much as I would like to keep plugging away, I am starting to feel constrained by my available hardware (4+ year old Linux boxes), and dropping cash on servers just to keep playing seems excessive. Moreover, if I am going to spend more time and money on this kind of thing, it really should be on Findory. Too bad, it is a cool data set.
By the way, I do want to mention one thing about the structure of the contest. I think the problem statement is a little off given the business needs of Netflix. In particular, the contest requires a recommender system which can predict movies you will hate or be lukewarm to, not just the movies you will like.
A system that predicts only what you will like, often referred to as TopN recommendations, is what most businesses want from recommendations. They want to surface interesting products to customers, helping customers discover products they have never seen. In Netflix's case, they mostly should want to help you discover movies to add to your rental queue.
You might think that a system that is the best at the Netflix contest problem would be the best at the TopN problem, but that is not the case. To see that, take a system that perfectly predicts the TopN next items you will want, but makes mistakes when you are lukewarm to a product. The RMSE of that recommender would be poor in the Netflix contest (because of inaccuracies in predicting ~3 star items), despite the obvious value of the recommender.
See also by Joe Weisenthal's recent post on TechDirt, "Netflix Experiment In Outsourced Innovation Showing Good Results".
See also my previous posts (   ) on the Netflix recommendations contest.