Thursday, July 19, 2007

The many paths of personalization

Here I go, quoting Gord Hotchkiss again. From an article today:
We're trying to paint personalization into a corner based on Google's current implementation of it. And that's absolutely the wrong thing to do.

Personalization is not a currently implemented algorithm ... [It] is an area of development.

Personalization, in its simplest form, is simply knowing more about you as an individual and using that knowledge to better connect you to content and functionality on the Web.

There are many paths you can take to that same end goal .... To win, Google doesn't have to do it perfectly. It just has to do it better than everyone else.
Right now, relevance rank must rank according to the average need. But, different people have different interpretations of what is or is not relevant. It is getting harder and harder to find improvements while still serving the average need.

At some point, the only way to further improve the quality of search results will be to show different people different search results based on what they think is relevant.

At that point, we have personalized search. Showing different results to different people based on what you know of their interests is personalized search.

There are many approaches to do personalized search. To the extent that Google's current algorithm is based on the Kaltix work, it is a coarse-grained approach, building a long-term profile which then subtly influences your future results. I tend to prefer a fine-grained approach that focuses on short-term history to help searchers with what they are doing right now.

Yet those are only two of the possibilities of how knowing more about a searcher's interests could help improve relevance. Google's implementation is not the only path. There are many ways to show different people different results based on their interests, some of which could prove more helpful than Google's to searchers.

See also some of my earlier posts ([1] [2] [3]) where I criticized Google's approach to personalized search and discussed an alternative.

See also my March 2005 post, "Personalization is hard. So what?", where I said that personalization "doesn't have to be right all the time. It just needs to be helpful."

6 comments:

Anonymous said...

Greg: At some point, the only way to further improve the quality of search results will be to show different people different search results based on what they think is relevant.

I can wholly agree with this.

Greg: At that point, we have personalized search. Showing different results to different people based on what you know of their interests is personalized search.

Well, then, under this definition, "relevance feedback" is a form of personalized search, natch?

Two users each do a query, and get the "average result" ranked lists. Suppose User A marks (in real time, via some thumbs up/down mechanism) documents 1, 3 and 5 as relevant, and 2 and 4 as non-relevant. The system then re-adjusts the remainder of the ranked list, biased towards that rel set and away from the nonrel set.

User B marks documents 2, 4 and 5 as relevant, and 1 and 3 as non-relevant. Again, the system then re-adjusts the remainder of the ranked list toward that rel set away from the nonrel set.

Clearly, User A and User B will see different sets of results, based on that relevance feedback step. As long as any two users give different relevance judgements on the same original query, they will see different, and therefore (by your definition) "personalized" results. Right?

Greg: There are many approaches to do personalized search. To the extent that Google's current algorithm is based on the Kaltix work, it is a coarse-grained approach, building a long-term profile which then subtly influences your future results. I tend to prefer a fine-grained approach that focuses on short-term history to help searchers with what they are doing right now.

So if the Kaltix/Google approach is long-term, and Findory is short-term, the relevance feedback approach is "immediate term". But relevance feedback is still personalization, non? Because the definition of "personalization" does not necessarily include any search history, long-term or short-term. It does not necessarily mean passive search engine information collection (clickthroughs, etc.) Active information collection ("tools") also count, right? At its very minimum, personalization just means different results for different folks. And relevance feedback is one way of accomplishing this.

Search engines ten years ago tried relevance feedback, and found that it did not really work. However (and we will have to ask Danny Sullivan to get the full lowdown), my understanding is that those engines of yore did not actually offer personalized relevance feedback. They did aggregate/average relevance feedback. They lumped your relevance feedback in with everyone else's. And that totally messed up the rankings. People would use the system to mod down competitors. People would give thumbs down to things that they had seen before (therefore no longer relevant to them), but which were still relevant to someone else. And by aggregating that information, it messed up rankings.

But true relevance feedback is, by definition, personalized. It uses your immediate actions, your immediate assessments of relevance, to re-order the list. And it passes that information along to no one else. Someone else looking at that exact same original query, at the same time, on their own machine, but giving different relevance judgments, would see a different list ordering.

As far as I know, no web-scale search engine has ever implemented this classical, traditional form of "personalized" relevance feedback. I wonder why that is. I see it as being very useful, as it is a form of personalization.

Toby DiPasquale said...

Seriously, Greg... love the blog, can't get enough of it, but can you change the colors to get rid of the awful yellow on navy blue scheme? I see imprints in my eyes after reading your pages and then switching to another tab. Sheesh.

Greg Linden said...

Yep, Jeremy, I think that individualized relevance feedback -- but not the aggregate relevance feedback tried in the past -- would be personalized search.

Getting people to provide explicit feedback might be a problem -- as we have discussed in the past, most people won't bother -- but using that kind of explicit data on interests would be personalization, I agree.

Greg Linden said...

Thanks, Codeslinger, I have heard that complaint before. My poor choice in that originally is a good indication of my utter lack of design skills.

I'll see if I can change the color scheme to be easier on the eyes sometime soon.

Anonymous said...

but using that kind of explicit data on interests would be personalization, I agree.

Oh, cool! We've reached a point of commonality! Always a good thing.

Getting people to provide explicit feedback might be a problem -- as we have discussed in the past, most people won't bother --

Yup, there are indeed issues around that, I agree. But I have always thought that we don't give the "common user" enough credit. For example, we say users are lazy, because they only type in a few words. Well, first of all, the average length of queries has gone up significantly (75%? More?) over the past few years. People are entering more. Second, I have seen experiments in which people actually enter more query words, if you give them a larger space in which to do so. For example, today's search engines only give you a one-line text box, and so people naturally assume that you only should enter a few words. I have seen experiments where researchers give the users a larger text field (multiline), with no other instructions than "enter your query". On average, the users who are given the multiline text field enter significantly longer queries than the users that are given the single line text box.

So again, as I was saying in a previous comment (on your Norvig interview), there is an important role that interface design plays in getting the users to do work. Oh, that, and the fact that, because of the aggregated approach to relevance feedback, no users have actually seen the direct, immediate benefits that this "personalized" tool can provide to them, was probably a strong de-motivating factor.

Arnav Khare said...

Hi Greg,
Love your blog, and your spotlight on personalization. Will you be attending SIGIR this year in Amsterdam?

I am working on a personalized search project very similar to what you propose here and the project is in an early-beta stage now. I will love to hear your comments on it... Is there any way I can get in touch with you?

Arnav
akhare@inf.ed.ac.uk