The paper starts with what should by now be a familiar-sounding motivation for personalized search:
Interacting with search engines has traditionally been an impersonal affair, with the returned results a function only of the query entered.To determine the potential for personalized search, the researchers analyzed "six months of query logs from the Yahoo! search engine" that "contained about 1.35 million cookies, 26 million searches, and 20 million clicks." Their goal was to determine "the extent of short and long term history available" and the "consistency and convergence rate" of user's interests.
Unfortunately the average query length is consistently reported to be around two, so many queries are too short to disambiguate the user's information need. Moreover, users often view only the first page of results, which makes precision critically important.
These limitations have motivated researchers to look beyond the query and consider how a search's context can provide further evidence about the user's information need.
Right at the beginning, the authors distinguish between using a searcher's short-term history to change search results, which they call "adjustment", and modifying searcher results using a profile built from their long-term history, which they refer to as "personalization".
Frequent readers of this weblog would know that I would call the first personalization and the second "probably not worth doing". But this paper does a good job quantifying the potential impact of both the short-term and long-term approaches to personalized search.
In particular, the authors looked at the number of searchers who had enough information for profiles built from long-term history. In their analysis, 50% of queries to Yahoo Search came from "users who performed at least 100 queries over the 6 month period." That seems promising.
However, later in the paper, they analyze the number of queries necessary for a user's interests to clearly converge and become distinct from the population as a whole. They determined it required "a few hundred queries". Less than 25% of queries and less than 3% of users appeared to have that much data.
This does not mean that a long-term, profile-based approach to personalization is not worth doing, but it does mean that it would only impact a minority of the queries and users.
The short-term approach, which they call "adjustment", appears to have potential to influence many queries. The researchers talk a bit about some promising approaches for that in the last part of the paper, including focusing on less common clickthroughs, clickthroughs that users tend to return to, and related clickthroughs. They claim that "with short-term adjustment, a single click ... could dramatically improve results for the rest of your search, even without any prior user history."
In the end, it is probably worth doing both approaches, but this paper is useful for understanding some of the limitations of each. Well worth reading.
For more on personalized web search, please also see some of my previous posts: "Beyond the commons: Personalized web search", " Google Personalized Search and Bigtable", " More on Google personalized search", and " New personalized web search at Findory".
By the way, if you like this post, you may also be interested in my post, "Recommending advertisements", on another of Omid Madani's papers.
Update: If you have trouble downloading the paper from Yahoo Research, you can also get it from the ACM.