Monday, June 29, 2009

New Google study on speed in search results

Googler Jake Brutlag recently published a short study, "Speed Matters for Google Web Search" (PDF), which looked at how important it is to deliver and render search result pages quickly.

Specifically, Jake added very small delays (100-400ms) to the time to serve and render Google search results. He observed that even these tiny delays, which are low enough to be difficult for users to perceive, resulted in measurable drops in searches per user (declines of -0.2% to -0.6%).

Please see also my Nov 2006 post, "Marissa Mayer at Web 2.0", which summarizes a claim by Googler Marissa Mayer that Google saw a 20% drop in revenue from an accidentally introduced 500ms delay.

Update: To add to the Marissa Mayer report above, Drupal's Dries Buytaert summarized the results of a few A/B tests at Amazon, Google, and Yahoo on the impact of speed on user satisfaction. As Dries says, "Long story short: even the smallest delay kills user satisfaction."

Update: In the comments, people are asking why the effect in this study oddly appears to be an order of magnitude lower than the effects seen in previous tests. Good question there.

Update: By the way, this study is part of a broader suite of tools and tutorials Google has gathered as part of an effort to "make the web faster".

Friday, June 26, 2009

The $1M Netflix Prize has been won

An ensemble of methods from four teams has passed the criteria to win the Netflix Prize.

Other teams have 30 days to beat it, but, no matter what happens, the $1M prize will be claimed in the next month.

Congratulations to the winning team and all the competitors. It was a goal that some thought impossible without additional data, but remarkable persistence has proven the impossible possible.

Please see also my earlier post, "On the front lines of the Netflix Prize", which summarizes an article that describes some of the algorithms that brought the winning team to where it is now.

Update: In an extremely close finish, a different team, The Ensemble, took the prize. Remarkable not only to see another team qualify for the grand prize, but then to see BellKor beaten with only four minutes left in the contest.

Congratulations to all, especially those who shared knowledge and joined together to help the winning teams. The winners discovered a solution to a problem that many thought might never be solved.

Update: It appears it is still unclear who ultimately will be declared the winner of the $1M prize.

Update Good article on the current state of the contest in the New York Times.

Update: Adding to this, there is a colorful story about the nailbiting finish in a blog post from the Pragmatic Theory team (part of the "BellKor's Pragmatic Chaos" submission).

Update: Three months later, Netflix awards the prize to BellKor and starts a second contest.

Tuesday, June 23, 2009

Is mobile search going to be different?

In an amusingly titled WWW 2009 paper, "Computers and iPhones and Mobile Phones, oh my!" (PDF), a quartet of Googlers offer some thoughts on where mobile search may be going.

In particular, based on log analysis of iPhone searches, they claim search on mobile devices is not likely to differ from normal web search once people upgrade to the latest phones. They go on to predict that an important future feature for mobile search will be providing history and personalization synchronized across all of a person's computers and mobile devices.

Some excerpts:
We have consistently found that search patterns on an iPhone closely mimic search patterns on computers, but that mobile search behavior [on older phones] is distinctly different.

We hypothesize that this is due to the easier text entry and more advanced browser capabilities on an iPhone than on mobile phones. Thus we predict that as mobile devices become more advanced, users will treat mobile search as an extension of computer-based search, rather than approaching mobile search as a tool for a distinct subset of information needs.

For [newer] high end phones, we suggest search be a highly integrated experience with computer-based search interfaces .... in terms of personalization and available feature set .... For example, content that was searched for on a computer should be easily accessible through mobile search (through bookmarks, search summaries), and vice versa.

This similarity in queries [also] indicates that we can use the vast wealth of knowledge amassed about conventional computer based search patterns, and apply it to the emerging high-end phone search market, to quickly gain improvements in search quality and user experience.

Thursday, June 18, 2009

Optimizing broad match in web advertising

A paper out of Microsoft Research, "A Data Structure for Sponsored Search" (PDF), has a couple simple but effective optimizations in it that are fun and worth thinking about.

First, a bit of explanation. When trying to match advertisers to search queries, search engines often (90%+ of the time) use broad match, which only requires a subset of the query terms to match. For example, if an advertiser bids for their ads to show on searches for "used books", the ad might also show on searches for [cheap used books], [gently used books], and [used and slightly traumatized books].

You can do this match using a normal inverted index, but it is expensive. For example, on the search for [cheap used books], you need to find all ads that want any of the terms (cheap OR used OR books), then filter out ads that wanted other terms (e.g. an ad that bid on "new books" need to be filtered out at this step). For very popular keywords such as "cheap", many ads can end up being retrieved in the first step only to be filtered in the second.

The clever and nicely simple idea in this paper is to use past data on the frequency of keywords to drop popular terms from the index whenever we possibly can. For example, let's say an advertiser bids for "the great wall of china", we can index the ad only under the lowest frequency word (e.g. "china") and not index under any of the other words. Then, on a search for [the awesomeness], we no longer have to retrieve then filter that ad.

The authors then extend this to deal with bids where all the keywords are popular (e.g. "the thing"). In the paper, they discuss a model that attempts to estimate the cost of throwing multiple words into the index (e.g. indexing "the thing") versus doing the query for each word separately.

The end result is an order of magnitude improvement in performance. All the irrelevant data you get from indexing everything wastefully pulls in many ad candidates that need to be filtered. In this case as well as others, it very much pays off to be careful about what you index.

Please see also my earlier post, "Caching, index pruning, and the query stream", that discusses a SIGIR 2008 paper out of Yahoo Research that explores some vaguely related ideas on index pruning for core search.

Wednesday, June 17, 2009

How much can you do with one server?

At a time when many of us are working with thousands of machines, Paul Tyma provides a remarkable example of how much you can do with just one.

Paul runs the clever Mailinator and Talkinator services. Mailinator lets people receive e-mail to arbitrary addresses under mailinator.com, mostly for disposable e-mail for avoiding spammers or annoying registration requirements. Talkinator is an instant messaging system for easily setting up and joining chat rooms.

Both are examples of removing the login friction normally associated with an application (in this case, mail and instant messaging) to discover a new application with different properties and uses.

In an older post, "The Architecture of Mailinator", Paul describes how he optimized the single server running the service to handle 6M e-mails/day. To summarize, the system is custom-built to the task, all unnecessary goo removed, and focuses on robustness while accepting a very small probability of message loss.

Recently, Paul updated that older post. Now they are processing 2M e-mails/hour on a single machine, 3.1T of data per month, and might have to expand to a second machine, not because the one machine cannot handle the load, but because two machines is the cheapest way of getting the extra bandwidth they need.

Paul also posted details on the architecture of Talkinator, which also is highly optimized to run well on a single server. If you take a peek at that, don't miss his amusing second post where he tested running on a 5W tiny plug-in server.

I am not advocating spending as much effort as Paul does on minimizing server costs. But, his work an inspirational counter-example to the common tendency to throw hardware at the problem. It is an enjoyable tale of building much by staying simple, both in the application features and the underlying architecture.

Tuesday, June 09, 2009

Approaching the limit on recommender systems?

An article at the upcoming UMAP 2009 conference, "I like it... I like it not: Evaluating User Ratings Noise in Recommender Systems" (PDF), looks at inconsistencies when people rates movies.

What is particularly interesting about the article is that the authors argue that "state-of-the-art recommendation algorithms" are nearing the lower bound on accuracy imposed by inconsistencies in ratings, which they and previous work refer to as the "magic barrier".

Natural variability in people's opinions limits how accurate recommender systems can be. According to this paper, we may be almost at that limit already.

Update: For more on this topic, it turns out one of the authors of the paper, Xavier Amatriain, has a blog post, "Netflix Prize: What if there is no Million $ ?", with a good comment thread.

Friday, June 05, 2009

On the front lines of the Netflix Prize

Robert Bell, Jim Bennett, Yehuda Koren, and Chris Volinsky have an article in the May 2008 IEEE Spectrum, "The Million Dollar Programming Prize", with fun tales of their work in the Netflix Prize along with a summary of the techniques that are performing best.

Some excerpts:
The nearest neighbor method works on the principle that a person tends to give similar ratings to similar movies. [For example, if] Joe likes three movies ... to make a prediction for him, [we] find users who also liked those movies and see what other movies they liked.

A second, complementary method scores both a given movie and viewer according to latent factors, themselves inferred from the ratings given to all the movies by all the viewers .... Factors for movies may measure comedy versus drama, action versus romance, and orientation to children versus orientation toward adults. Because the factors are determined automatically by algorithms, they may correspond to hard-to-describe concepts such as quirkiness, or they may not be interpretable by humans at all .... The model may use 20 to 40 such factors to locate each movie and viewer in a multidimensional space ... then [we predict] a viewer's rating of a movie according to that movie's score on the dimensions that person cares about most.

Neither approach is a panacea. We found that most nearest-neighbor techniques work best on 50 or fewer neighbors, which means these methods can't exploit all the information a viewer's ratings may contain. Latent-factor models have the opposite weakness: They are bad at detecting strong associations among a few closely related films, such as The Lord of the Rings trilogy.

Because these two methods are complementary, we combined them, using many versions of each in what machine-learning experts call an ensemble approach. This allowed us to build systems that were simple ... easy to code and fast to run.

Another critical innovation involved focusing on which movies a viewer rated, regardless of the scores. The idea is that someone who has rated a lot of fantasy movies will probably like the Lord of the Rings, even if that person has rated the other movies in the category somewhat low ... This approach nicely complemented our other methods.
Please see also my earlier post, "Netflix Prize at KDD 2008", which points at papers with more details than the IEEE article, including another recent paper by Yehuda Koren.

Thursday, June 04, 2009

The siren song of startups

Over at the blog of the Communications of the ACM, I have a new post on "The Siren Song of Startups".

The article tries to get readers thinking more carefully about why they might and might not want to join a startup.

If you like the post, you might enjoy my other posts over at blog@CACM: "Enjoying Reading Research", "What To Do With Those Idle Cores?", and "What is a Good Recommendation Algorithm?"

Monday, June 01, 2009

Yahoo CEO Carol Bartz on personalization

Interesting tidbit on personalization in a Q&A with Yahoo CEO Carol Bartz:
Yahoo! is the place where millions of people come every day to see what is happening with the people and the things that matter most to them.

That could mean what's happening in the world -- like breaking news, sports scores, stock quotes, last night's TV highlights -- and your world -- like your email, photos, groups, fantasy leagues.

Based on what we know about you ... we can bring you both those worlds. So I think our clear strength is "relevance" -- whether that means knowing what weather to give you or serving up headlines you'll be interested in. It's all about really getting you.
So Yahoo wants to filter and recommend relevant content based on what it knows about you. Is this a new goal for Yahoo?

Two years ago, Yahoo co-founder Jerry Yang spoke about "better tailoring Yahoo's iconic Web portal to individual users, with the help of technology that predicts what they want."

Four years ago, Yahoo CEO Terry Semel said that one of the "four pillars" of Yahoo was "personalization technology to help users sort through vast choices to find what interests them" and Yahoo executive Lloyd Braun called it one of Yahoo's "secret weapons".

Five years ago, Yahoo CEO Terry Semel said, "Personalization will also play more of a role on the Yahoo home page in the coming months" and "painted a picture in which users could tailor the Yahoo home page to suit their particular interests," adding, "We want the home page to be totally personalized."

Making Yahoo content more relevant and useful to people would be fantastic. Personalization is a way to make the site more relevant and useful. But this idea clearly has been around the halls of Yahoo for some time. The key is going to be executing on it quickly.

Please see also my May 2006 post, "Yahoo home page cries out for personalization".

Update: Three weeks later, Forbes Magazine quotes Carol Bartz as admitting that Yahoo's troubles have been "an execution problem ... we are really working on (moving) from 'we can do this, we can do this' to 'we did do this.'"

She also has several other quotes in that same article that seem quite promising for the future of Yahoo, including that "none of us hate ads ... we just hate crappy ads" and that we should see Yahoo doing personalization and recommendations of news and other home page content soon. Great to hear it.

Update: Seven months later, Carol Bartz pumps personalization but admits that "we have been letting great data ... fall to the floor."

Update: One year later, Carol Bartz says, "Tomorrow's Yahoo is going to be really tailored" and "I want it to learn about me ... and cull through the massive amount of information that's out there to find exactly what I want." But, as ex-Yahoo Jeremy Zawodny comments, "The problem is that Yahoo execs have been saying this for literally years now. When will it come true?"