The paper is a great read, not only because Yehuda is part of the team currently winning the Netflix Prize, but also because it has some surprising conclusions about how to deal with changing preferences and interests over time.
In particular, it is common in recommender systems to favor recent activity, such as more recent ratings by a user, either by only using the last N data points or by weighting more recent data more heavily. But Yehuda found that ineffective on the Netflix data:
The consistent finding was that prediction quality improves as we moderate ... time decay, reaching [the] best quality when there is no delay at all. This is despite the fact that users do change their taste and rating scale over the years.As in some of Yehuda's past work, he combines two models, one a latent factor model, the other an item-item approach. The models yielded "the best results published so far" on the Netflix data set by allowing them to represent temporal effects such as finding stronger relationships between items related in a short timeframe, handling that people tend to give higher ratings to older movies (if they bother to rate them at all), allowing for people to shift to giving higher or lower ratings on average over time, and capturing that people tend to use the same rating for multiple items rated in a short timeframe.
Underweighting past action loses too much signal along with the lost noise, which is detrimental given the scarcity of data per user .... We require an accurate modeling of each point in the past, which will allow us to distinguish between persistent signal that should be captured and noise that should be isolated .... for understanding the customer ... [and] modeling other customers.
The paper is full of other cute tidbits too, like that they tried to detect day of the week effects -- do people rate lower on Mondays? -- but could not. They also discovered an unusual jump in the average rating in the data in 2004, which they hypothesize was due to features launched on the Netflix.com site that started showing people more movies they liked. Definitely worth a read.