I flew back from SES yesterday, so that means I got a chance to catch up on more of my reading on the plane.
Of the papers I plowed through, one of them is particularly fun, "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews" by Kushal Dave, Steve Lawrence, and David Pennock.
The goal of the paper is pretty ambitious: Take all the reviews out there for each product and summarize them. Tough problem. Nasty natural language issues here.
But the payoff is big. This is something that would be quite useful, especially if some method of determining the credibility or the authority of each review was part of the process. People need help them differentiating between the vast number of products out there. Summarizing reviews could be a way of providing useful information quickly, much more easily than reading each individual reviews.
One thing that's great about this paper is that they detail their search through many different approaches to the problem, some simple, some more complicated. It is interesting that some of the most effective methods turned out to be fairly simple.
Another fun thing about this paper is the authors. Steve Lawrence was one of the authors of Citeseer. Kushal Dave and Steve Lawrence are now both at Google. David Pennock was at Overture and now is at Yahoo Research.
By the way, this summarizing reviews idea reminds me a bit of Newsblaster, the research project at Columbia that tries to automatically summarize news articles from many sources. If you haven't seen that yet, it's worth checking out.
Update: Gary Price wrote me to let me know about NewsInEssence, a news clustering and summarization research project out of U of Michigan.