Thursday, November 30, 2006

MSN Search and beating Google

Dare Obasanjo from Microsoft has a good post where he argues, "Competing with Google's search engine is no longer about search results quality, it is about brand and distribution."

Dare goes on to explain that Google has become the default search engine. To win, Microsoft needs to reacquire the channels and mindshare that would make them the default.

I agree pretty much with Dare on this one. Back in 2003, things may have been different, but, at this point, I think Microsoft needs to throw around their market power to win.

Microsoft should lock up channels with partnerships, cut out Google from the defaults, make exclusive advertising deals that suck revenue away from Google, and make Live (or MSN or whatever brand they finally pick) part of the general lexicon. It is not pretty. It is not nice. But it is what they must do to win.

See also my April 2006 post, "Kill Google, Vol. 3", where I said:
Microsoft should strangle Google's air supply, their revenue stream .... Microsoft should use its size to make deals .... Microsoft should use its market power to be the exclusive ad provider for large sites .... Microsoft should ... make being an advertising provider unprofitable for others.

If Microsoft wants to win, it should play to its strengths. It should not seek to change the game. It should seek to end the game.
See also my previous post, "Google dominates, MSN Search sinks".

Update: Four months later, Microsoft appears to be making new efforts to lock up channels with partnerships. John Battelle reports, "Microsoft is offering its large enterprise customers free service and product credits if those customers push Live search inside their enterprises."

Wednesday, November 29, 2006

Google Answers croaks

Andrew Fikes and Lexi Baugher post on the Official Google Blog that Google Answers will stop taking questions and effectively shut down.

Google Answers was a clever but unpopular site where you could ask any question and have it investigated by a small group of professional researchers. Fees were quite high, so the audience was fairly limited. Moreover, it always seemed out of place with Google's normal tendency to focus on automated solutions.

In light of this shutdown, I think it is worthwhile to compare Google Answers to some of the other question answering services out there.

Danny Sullivan has a thoughtful post comparing the now defunct Google Answers to the more successful and free Yahoo Answers. Like Danny, I have been surprised by the relative success of Yahoo Answers given the low quality of both the questions and the answers.

Another interesting comparison is with the fledgling Askville and NowNow question answering services from Amazon. Those services appear to be trying to blend Google Answers (tens of dollars for answers from experts) with Yahoo Answers (free for answers from idiots); Askville and NowNow use Mechanical Turk and will charge under a dollar for answers. I am curious to see if these Amazon Q&A services succeed, or if the lesson from Google Answers is that people are not willing to pay for answers regardless of quality.

Gary Price also reminds us that Ask Jeeves had a Q&A service called Answer Point that they shut down in 2002. It apparently was similar to Yahoo Answers and was free. That may suggest that we should not conclude that the free model of Yahoo Answers is better, but that none of these community-based question answering services, whether free or not free, have legs.

On a slight tangent, I think it is great that Google is shutting down some of its failed experiments to try to keep their focus. Just last week, I was lamenting the amount of failed and failing features on Google, Yahoo, and Amazon and wrote, "Old products never die, but they should. To innovate, it is not enough to love creation. We must also love destruction."

See also my previous posts ([1] [2]) about Yahoo Answers.

Update: See also Brady Forrest's thoughts over at O'Reilly Radar.

Update: See also Nick Carr's post, "The five Google products".

Monday, November 27, 2006

Item-to-item collaborative filtering

There appears to be a little confusion in some of the research literature on the earliest work on item-to-item collaborative filtering, a recommender algorithm that focus on items rather than users.

The earliest work of which I am aware is:
G. Linden J. Jacobi and E. Benson, Collaborative Recommendations Using Item-to-Item Similarity Mappings, US Patent 6,266,649 (to, Patent and Trademark Office, Washington, D.C., 2001
That patent was filed in 1998 and issued in 2001.

A later academic paper
Greg Linden, Brent Smith, Jeremy York, Recommendations: Item-to-Item Collaborative Filtering, IEEE Internet Computing, v.7 n.1, p. 76-80, January 2003
is a more friendly description of the work in the 1998 patent. It cites the patent as previous work.

Another paper that appears to be frequently cited is:
Badrul Sarwar, George Karypis, Joseph Konstan, John Reidl, Item-based collaborative filtering recommendation algorithms, Proceedings of the 10th international conference on World Wide Web, p.285-295, May 01-05, 2001, Hong Kong, Hong Kong
Some publications mistakenly have written that Sarwar et al. first "introduced" or "proposed" item-to-item collaborative filtering. Even The Economist got this wrong, later issuing a correction.

This confusion may be because the Sarwar et al. paper did not reference the patent. The 1998 patent preceded Sarwar et al. by three years and was public information well before the Sarwar et al. paper was published. The 1998 work probably should have been cited by Sarwar et al., certainly if any of the authors had reviewed it, but probably even if they had not.

I realize the situation is complicated. The 1998 patent usually is referenced with its 2001 issue date, as is the convention, making it less clear that it preceded Sarwar et al. by three years. The Sarwar et al., the first academic publication on the algorithm, does not reference the patent at all, further confusing the issue.

Nevertheless, please be careful about crediting the earliest work. Please note the earlier publications when writing about item-to-item collaborative filtering.

Saturday, November 25, 2006

Conquering small displays

Much of the UI effort in mobile focuses on the hard problem of picking what information to display on tiny screens. Many of the mobile search startups are focused on this problem exclusively, but the solutions are unsatisfactory.

When I look at this problem and the effort going into it, I wonder if we are just a couple years away from a hardware solution that makes much of it obsolete.

To see what I mean, let me dive back to a year ago when I was enjoying an excellent talk at UW CS by Patrick Baudisch from Microsoft Research. The talk, which is available for download, asked:
How can we display complex documents on displays the size of a stamp? How can users interact with such documents?
Pat proposed summarization and attention-focusing techniques as the solution:
"halo" helps users perform spatial reasoning on large maps; "summary thumbnails" and "collapse-to-zoom" allow users to make sense of web pages by compressing them to the size of the phone screen.
It is a fascinating subject, summarizing information and focusing attention on small devices. But, after watching this talk, I wondered how much of this problem is a real, long-term problem or a temporary one created by our current hardware.

For example, I could imagine a small, monocular-like device that I hold up to my eye. Looking through this, I could see what would appear to be a massive screen covering most or all of my field of vision, not that much different than sitting 12" away from a 20" flat screen display.

Even better, maybe the form factor could be sunglasses and the image could be drawn on the glass or projected directly on the retina.

I tried to get at this with an e-mail question to Pat after the lecture, asking:
Is the problem actually the small screen? Or is it really the low resolution of the small screen? If, for example, screens on cell phones had 1280 x 1024 resolution in a screen only a couple inches on each side, would this change the problem?

As you said in the talk, the problem seems to be centered around readability. If the resolution was high enough that the screens were readable if held close to the eyes, would that change the nature of the problem?
I may have failed to describe the idea well. Pat responded that he was concerned about people with poor eyesight being able to focus on and read a tiny but high resolution screen. However, I am fairly sure that, if the device is held up to the eye, the image could be displayed so that the eye should be focused at infinity, not on the device an inch away.

The idea here is fairly obvious. Small displays do not appear small if they are held close to the eye. A virtual display can appear massive even if coming from a small device.

I suspect all we need is the ability to display at high resolution on a tiny screen.

So, is the problem of optimizing content for tiny screens a real, long-term problem? Or is it one that soon will disappear as hardware improves?

Update: About a year later, the NYT reviews the Myvu Universal, virtual display glasses where "the picture appears to float a few feet in front of you."

Update: Fourteen months later, the NYT reports on "the Pico Projector ... a card-sized device that connects to a cell phone or other gadget and uses a laser to project an image at the equivalent size of a 60-inch television screen."

Friday, November 24, 2006

My AOL launches news recommendations

Sam Sethi at TechCrunch UK reports that My AOL integrated "personalized content recommendations" into their beta feed reader.

There are two sections, "People Like Me Content" and "Recommended Content". The "People Like Me Content" help popup says:
As other people use My AOL, they occasionally click on stories that are similar to the items you have selected. Our system recognizes these similarities and provides additional content that might be of interest to you.
The "Recommended Content" help popup says:
These are personalized content recommendations. As you click on headlines within My AOL, we (well, our computers) “learn” what you like and suggest similar stories.
The difference is not apparent to me -- both say they use your reading history to find similar content -- but perhaps one group of recommendations are user-based and the other is content-based.

The service appears quite similar to Findory -- perhaps even inspired by Findory -- but the quality of the recommendations seems a bit off in my tests.

For example, I clicked on six articles on My AOL, a TechCrunch article about Google Blog Search and the five most recent articles from my weblog, Geeking with Greg, which happen to all be about Microsoft, Google, Yahoo, and Amazon. My "People Like Me Content" was:
  • RR of the Day: 1984 Alfa Romeo GTV6
  • Google's poo apparently doesn't smell
  • Big Brother Is Listening
  • Virtualization Disallowed For Vista Home
  • EPA to Regulate Nanoproducts Sold As Germ-Killing
  • Suspect Captured at Miami Herald
  • British Government Attacks Own Citizens
  • USA TODAY: Teacher's Space Goal Delayed 21 Years
Not good at all. However, the "Recommended Content" was quite a bit better, though very tightly focused on search:
  • Interesting Yahoo result in Google Search
  • My favorite blogger/blog of the moment...
  • Tracking a package through MSN Search
  • RSS mashup: Amazon, eBay, Yahoo! product results
  • Finding Search related Jobs
  • Community Powered Search
  • Become helps you search a little closer
  • Yahoo! vs. Google -- more or less peanut butter?
For comparison, here are the recommendations you get if you click the same articles on
  • Google's Kirkland Office
  • The Inefficiency of Feed Readers
  • Google AdSense Gift 2006: Digital Photo Frame
  • Search Engine Thanksgiving Logos 2006
  • Top Web Apps in Serbia
  • Canadian ISPs Launch Fight Against Child Porn
  • Yahoo & Peanut Butter Market Timing
  • AbbreviationZ : Acronyms & Abbreviations Search Engine
Certainly, My AOL's news recommendations are not bad for a first effort. I would expect them to improve in time as they gather more data and refine their algorithms.

I find it very interesting to see this new feature coming out of AOL. Other than Google Personalized Search, this effort from My AOL looks like the biggest use of recommendations and personalization from the search giants so far, bigger than the recommended stories in Google News and MSN Newsbot and the feed suggestions in Bloglines.

I wonder if we will soon see additional personalization and recommendation features launched by the search giants, not just in news and feeds, but also in podcasts, videos, search, and advertising.

Update: In the comments for this post, Jim Simmons (PM, Personalization, AOL) confirms that the difference between "Recommended content" and "People Like Me Content" is content-based vs. user-behavior-based recommendations.

Thursday, November 23, 2006

Amazon crashes itself with promotion ran a special promotion today that offered an Xbox system for $100, about 1/3 of the normal price, starting at 11am.

Broadband Reports posts about what happened:
So many people were waiting for the promotion that the entire Amazon website - not just the promotion page - sank without a trace from just before 2pm, to at least 2:12pm. The home page, the product pages, everything, were unavailable.
Sounds familiar. When I was at Amazon, every year we in engineering would try to avoid spikes in traffic, especially around peak holiday loads, and every year marketing folks would want to run some promotion specifically designed to create a mad frenzy on the site. Usually, we convinced them to change the promotion, but apparently engineering lost (or was asleep at the switch) this year.

Broadband Reports goes on to point out that this reflects badly on Amazon:
We wonder how many amazon shoppers elsewhere in the site abandoned their purchases halfway through after they found their experience destroyed by the vote rush going on in the next room ... Some people got quite irate.

The poor performance of the amazon site during the giveaway also reflects badly on the Amazon "elastic compute cloud" offering (Amazon EC2) which is designed, supposedly, to offer instant capacity to companies which need to deal with exactly this kind of sudden rush.
I don't think it quite works that way. A DDoS attack, which this effectively was, can generate way over the x10 peak load for which the website would be designed. Even so, it still is pretty lame for Amazon to DDoS itself.

It appears the contest is running again next week with the same structure. I wonder if Amazon will crash itself again?

Update: It appears Amazon is looking at changing the structure of this promotion to prevent another brownout. Currently, there is a message up that says, "Due to the popularity of Amazon Customers Vote, we are extending the Week 2 voting period. Customers who cast a vote will be sent an e-mail notification of the new sale date."

Update: Mike at TechDirt reports that "Amazon Cries 'Uncle' On Promotion Traffic" by changing the rules to prevent another outage.

Wednesday, November 22, 2006

Google dominates, MSN Search sinks

Danny Sullivan posts detailed numbers on search market share, combining data from Comscore, NetRatings, and Hitwise.

To summarize it all, Google grabbed more share at everyone else's Microsoft's and AOL's expense.

Particularly bad off was MSN Search, as Danny shows in this graph of Microsoft's market share in search:

Ouchie. As Danny says, "[Not] a pretty picture for Microsoft ... They haven't held share. It's drop, drop, drop."

It really is remarkable how badly Microsoft is doing against Google. I never would have thought that, nearly four years after they started their "Underdog" project to build a Google-killer, Microsoft would not only be badly behind in search, but also actually losing market share.

See also my earlier posts, "Yahoo and MSN cannot compete?" and "Kill Google, Vol. 3".

Update: Corrected the post to say that Google's gains came at the expense of Microsoft and AOL alone. Yahoo and Ask appear to have held share.

Update: For full disclosure, I should say that Chris Payne and I talked about Underdog back in 2003 (before I started Findory). Not to worry, I had no influence to speak of -- Chris and I disagreed on what was necessary to beat Google -- but I certainly am more critical of MSN Search's missed opportunities because of that history.

Update: Erik Selberg (creator of Metacrawler, now at MSN Search) takes issue with those who would criticize Microsoft's progress, saying "Well, what did anyone really expect?" and "It's not realistic to think that it can be done quickly." He also has some thoughts on the problems at Yahoo and upcoming stagnation at Google. Definitely worth reading his point of view.

Update: Coming full circle, Danny Sullivan comments on Erik's post.

Update: And a follow up post from Erik. Erik says. "Microsoft might beat Google. And Google might beat Microsoft .... Google is pressing ahead, and they've got a big lead ... Unless they do something monumentally stupid, which I doubt, it'll be a long, tough challenge to catch and beat them." He also defends the decision to move to the Live brand. Again, worth reading.

Update: There is also some discussion with Erik and others in the comments for this post.

Update: See also my follow-up post, "MSN Search and beating Google", that includes some good thoughts from Dare Obasanjo.

Update: A couple weeks later, Saul Hansell at the NYT writes:
There is a lot about the way Microsoft has run its Internet business that Steve Berkowitz wants to change. But he is finding that redirecting such a behemoth is slow going.

The pressure is on for Mr. Berkowitz to gain control of Microsoft's online unit, which by most measures has drifted dangerously off course.

Over the last year, its online properties have lost users in the United States. The billions of dollars the company has spent building its own search engine have yet to pay off. And amid a booming Internet market, Microsoft's online unit is losing money.

Google, meanwhile, is growing, prospering, and moving increasingly onto Microsoft's turf.

Microsoft lost its way, Mr. Berkowitz says.
The article goes on to show Microsoft's steep drop in market share, talk about brand confusion between MSN and Live, and discuss how far behind MSN Search appears to be in relevance.

Update: Two months later, Danny Sullivan reports that Microsoft's search market share is continuing to decline.

Tuesday, November 21, 2006

Innovation and learning to love destruction

Marissa Mayer says that Google operates "like small companies inside the large company" and feels "a lot like managing a VC firm."

Jeff Bezos "encourages experimentation ... as much of it as possible" in order to "maximize invention per unit of time", "invent as many things per day per week as you can manage", and get "faster innovation".

Innovation and experimentation, that is seen as the way to get ahead. Build, create, innovate.

For this strategy to work well, companies cannot only be quick to create. They need to be quick to destroy. If something does not work, the company needs to move on quickly. Failures need to be acknowledged, all possible learning extracted, and then the product should be eliminated.

This is not what happens. Instead, unsuccessful products are left up on the site to rot. Failed experiments become useless distractions, confusing customers who are trying to dig through the options to find what they need and frustrating any customer foolish enough to try them with the obvious lack of support., for example, has 63 links on their "all product categories" page, a confusing mess that paralyzes anyone looking for a book or DVD with irrelevant and useless choices. Why do all these continue to exist? Why do Auctions and zShops hang around for years after they failed to attract an audience? Why do detail pages accumulate more and more "exciting new features" until I cannot find the customer reviews anymore under the sea of crap?

Google has 36 products and another 20 in Google Labs. It is enough that an exasperated Sergey Brin said, "I was getting lost in the sheer volume of the products that we were releasing." Admitting that "myriad product releases were confusing their users", Google is pushing its teams to develop "features, not products."

Yahoo has so many services I cannot even count them, let alone find what I want. As a now infamous internal memo pointed out, many of these products overlap with each other, perform poorly, or both. The memo pleaded for the company to find focus, asking Yahoo's management to "definitively declare what we are and what we are not."

Innovation is the process of creative destruction. Improved products destroy the failed products. Innovation is a churning cauldron of life and death.

Google and Amazon claim to be like VC firms, creating little startups within their company, but they lack the process of destruction. At these companies, old products live forever. Failures become zombies, surviving with skeleton teams and little resources, but still managing to distract the company while confusing users.

Old products never die, but they should. To innovate, it is not enough to love creation. We must also love destruction.

Monday, November 20, 2006

Creating a smart Google

Jeffrey O'Brien at Fortune writes about "The race to create a 'smart' Google". Some excerpts:
Recommender systems ... are sprouting on the Web like mushrooms after a hard rain. Dozens of companies have unveiled recommenders recently to introduce consumers to Web sites, TV shows, other people - whatever they can think of.

The company that can decipher all that information ... will pinpoint your tastes and determine the likelihood that you'll buy any given product. In effect, it will have constructed the algorithm that is you.

There's a sense among the players in the recommendation business ... that now is the time to perfect such an algorithm.

The Web, they say, is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.
I couldn't have said it better myself. You cannot search for something if you don't know it exists. Discovery helps surface interesting gems without any effort, without any explicit search, from a sea of information.

There is a good quote in the article from John Riedl, someone who has been working on recommender systems longer than just about anyone:
"The effect of recommender systems will be one of the most important changes in the next decade," says ... professor John Riedl ... "The social web is going to be driven by these systems."
A good friend, Brent Smith, is also quoted:
Amazon realized early on how powerful a recommender system could be and to this day remains the prime example. The company ... [compares] your purchasing patterns with everyone else's and thus narrow a vast inventory to just the stuff it predicts you'll buy.

"Personalized recommendations," says Brent Smith, Amazon's director of personalization, "are at the heart of why online shopping offers so much promise."
The article does focus on promise, taking a negative tone toward well established, lucrative systems at companies like Amazon and Netflix but giving startups, some of which have little more than vaporware, the benefit of the doubt.

It is a little unfortunate. The article leads with a sensationalistic title -- that Google sure ain't that smart, heh, heh, snark, snark -- but then fails to show anything that clearly represents progress toward a smarter Google. In fact, after name dropping Udi Manber and Peter Norvig, the article even holds up Google as the likely leader in the race to build a smarter Google.

But, overall, I agree that recommender systems are growing in importance, especially in terms of application to web search, advertisements, and video, and that future recommendation systems will be even more lucrative than they are now. As a good colleague of mine was fond of saying, "The future will be personalized."

Update: Mike at TechDirt doesn't like the hype either, and then goes a step further by slamming all recommender systems as "far from useful", "exceptionally limited", and "littered with failures". While I think it is going too far to condemn all recommender systems -- I am not sure Mike is aware of how much money personalization features generate for, for example -- his post is good for a contrarian view.

Saturday, November 18, 2006

Yahoo "peanut butter" memo

Paul Kedrosky posts a brutally critical leaked internal memo from Yahoo. Don't miss it.

Some selected excerpts:
We lack a focused, cohesive vision for our company .... We lack clarity of ownership and accountability.

We end up with competing (or redundant) initiatives ...
  • YME vs. Musicmatch
  • Flickr vs. Photos
  • YMG video vs. Search video
  • vs. myweb
  • Messenger and plug-ins vs. Sidebar and widgets
  • Social media vs. 360 and Groups
  • Front page vs. YMG
  • Global strategy from BU'vs. Global strategy from Int'l
We have lost our passion to win. Far too many employees are "phoning" it in.

We need to boldly and definitively declare what we are and what we are not .... Focus the vision .... Restore accountability .... Blow up the matrix .... Kill the redundancies ... [Stop] competing against each other.

Change is needed and it is needed soon. We can be a stronger and faster company.
See also my earlier post, "Yahoo's troubles", where I said, "The business is advertising ... To fail to compete on advertising is to fail."

Update: Dare Obasanjo points out:
Yahoo! executives are contemplating firing one in five Yahoo! employees ... Layoffs are a demoralizing affair and often don't eliminate the right people especially since the really smart people know to desert a sinking ship instead of hanging around to see if they draw the short straw.
Google is a mere 5.8 miles down the road. It may be hard for Yahoo to retain their best at this time of instability.

Tuesday, November 14, 2006

Excellent data mining lecture notes

I have been reading and enjoying the slides from the Stanford CS Data Mining class being taught by Anand Rajaraman and Jeff Ullman.

The talk on recommender systems (PDF) was particularly interesting, with a thorough and insightful look at different techniques (e.g. collaborative filtering, item-to-item, content-based, clustering) for personalization and recommendations. Note that one of the options for the class project is working on the Netflix contest.

The talks on association rules (PDF #1, PDF #2) were fun with some clever applications discussed (e.g. detecting plagiarism) and nice optimizations (e.g. sampling the data set at the limit of main memory multiple times to determine which data can be ignored in a full run).

The clustering talks are also worthwhile, focused on handling very large data sets and clearly explained. Finally, if you are working on web search (or are an evil SEO), it is worth reviewing the talks on page rank and web spam.

Looks like a great class. Impressive that this is all being covered at the undergraduate level.

Ruthless enough for a startup?

I have been reading about how several successful startups -- Facebook, MySpace, BitTorrent, YouTube, Skype, and HotOrNot -- fueled their early growth that lead to their success.

In all the cases, these startups did things that I probably would not have been willing to do. It makes me wonder how ruthless you have to be to have a successful startup.

Facebook, for example, "had access to the e-mail addresses of Harvard students" and "blasted e-mails to Harvard students to let people know about the site." The site, which allows people to list information about themselves and meet other students, largely seems driven by social interaction, dating, and self-promotion.

Similarly, MySpace "had a database of ~100M e-mail addresses" which they spammed to announce their launch. MySpace is broader than Facebook, but also largely seems driven by social interaction, dating, and self-promotion.

BitTorrent, a P2P filesharing network, launched after "[Bram] Cohen collected a batch of free porn and used it to lure beta testers". The site soon collected "long lists of pirated content" including full length movies and pornographic material.

HotOrNot started as a way to "settle an argument" and soon turned into a popular and lucrative website that is "serving some basic human psychological needs around social validation, ego, and voyeurism" and allows people to "enjoy the voyeuristic aspect of checking people out."

The YouTube founders, when they first launched, "figured the best thing would do would be to get hot chicks involved". Later, they implemented a feature that allowed "one-click emailing to spam a friend about a video." I suspect it is also true that YouTube succeeded where many other video startups failed largely by being less vigilant about purging copyright content and soft porn, all easy to find on the site.

Skype's founders originally created Kazaa, a filesharing network that encouraged illegal trading of copyright content. Skype is a clever iteration from Kazaa. It follows a similar theme of giving away stuff that used to cost money for free, but this time it is legal. Like Kazaa, it uses other people's resources (especially those who are blessed with the privilege of being a supernode) to provide the service.

There seem to be some dismal lessons in these stories. It appears the ideal startup will give away something that used to cost money for free (preferably copyright material and porn), use other people's content and resources, appeal to the baser human instincts (especially vanity and sex), and spam massive e-mail lists at launch.

And this makes me wonder, am I ruthless enough? In Findory Video, for example, the system tries to automatically filter the soft porn that appears quite popular on both YouTube and Google Video. Is that a mistake? Findory has never spammed anyone. Findory keeps well within fair use for copyright material. Findory directs traffic to content providers to help them earn revenue from their work. Are those mistakes?

Is ruthlessness the key to success for Web 2.0 startups? Are you ruthless enough to succeed in the same way these others have done?

Update: Yahoo just acquired Bix, a website that features "hot or not and other contests" and "launched barely three months ago". Yet another example.

Sunday, November 12, 2006

AI and "Web 3.0"

When I saw John Markoff's article, "Entrepreneurs See a Web Guided by Common Sense", on the front page of the NYT today, I did not know whether to feel excited or dismayed.

On the one hand, the distant goals of many working on information retrieval were nicely laid out in the article:
Computer scientists and a growing collection of start-up companies are finding new ways to mine human intelligence. Their goal is to add a layer of meaning on top of the existing Web that would make it less of a catalog and more of a guide -- and even provide the foundation for systems that can reason in a human fashion.

In the future, more powerful systems could act as personal advisers in areas as diverse as financial planning, with an intelligent system mapping out a retirement plan for a couple, for instance, or educational consulting, with the Web helping a high school student identify the right college.

The Holy Grail ... is to build a system that can give a reasonable and complete response to a simple question like: "I'm looking for a warm place to vacation and I have a budget of $3,000. Oh, and I have an 11-year-old child."
But then, the article discredits this vision by attaching it to the buzzword "Web 3.0". Readers easily could ignore the caveats in the article, see the absurd claims that Flickr and Digg represent substantial progress in AI, and then come away with the impression that intelligent web applications are less than decades away.

Overpromising and underdelivering caused much disenchantment with artificial intelligence in the 1970's and 1980's. It would be a shame to see it happen again.

While I subscribe to the vision and goals laid out, I want to emphasize the words of the skeptics. From the article:
Artificial intelligence, with machines doing the thinking instead of simply following commands, has eluded researchers for more than half a century.

Referred to as Web 3.0, the effort is in its infancy, and the very idea has given rise to skeptics who have called it an unobtainable vision.

Researchers and entrepreneurs say that while it is unlikely that there will be complete artificial-intelligence systems any time soon, if ever.
It is true that we are building more intelligent Web applications. Some of these systems do simple learning and adaptation using the behavior of their users. For example,'s website adapts to the interests of each shopper and improves the more it is used.

But it is a long way from this to the Holy Grail. These early applications work from detecting patterns in data. They have no understanding of language. They cannot reason about user goals. They have no base of knowledge that would allow them to make common sense connections.

There is no way in which these early systems can take a goal like "Plan for me a warm vacation appropriate for my 11 year old" and reason about it like a travel agent would. Building that application, while a noble and worthy challenge, is at least decades off.

AI researchers, do not overpromise and underdeliver again. Cut out the "Web 3.0" hype. Let's be realistic. Even without the chimerical Holy Grail of AI, we can help people find and discover what they need.

Saturday, November 11, 2006

Andrei Broder talk on information supply

Yahoo VP and Research Fellow Andrei Broder is giving an IEEE talk, "The next generation Web Search: From Information Retrieval to Information Supply" on Nov 16 at Stanford University.

The idea of "information supply" is very close to the idea that I spend my time working on and advocating, information personalization and recommendations. From the abstract for Andrei's talk:
The goal of Web IR will widen to include the supply of relevant information from multiple sources without requiring the user to make an explicit query.

[We can] supply relevant information specific to a given activity and a given user, while the activity is being performed. A prime example is the matching of ads to content being read, however the information supply paradigm is starting to appear in other contexts such as social networks, e-commerce, browsers, e-mail, and others.
A Yahoo Research page, "Toward the Next Generation of Search", elaborates on Andrei's thoughts on personalization of information, saying:
Andrei Broder ... foresees pushing search toward information supply: serving up answers to users' questions that they haven't even typed in a search box.

Worry not, users; this isn't mind reading. But with statistical analysis of people's surfing habits and creative algorithms, we ... hope to figure out users' intents and understand their context, so we can supply them with useful information.

It could get displayed in a variety of ways, such as recommended links or intelligent, personalized footnotes dynamically served up on the bottom of a Web page.

"It's a little bit of the 'push' paradigm," Broder says, but he says the way the information is presented to users is key, so that it is unobtrusive but useful.
Andrei has given what appear to be similar versions of this talk over the last year. For example, here are slides (PDF) from a May 2006 talk titled "From query based Information Retrieval to context driven Information Supply". I wrote about that talk back in June 2006.

See also my posts ([1] [2] [3]) about related work by Susan Dumais and Eric Horvitz at Microsoft Research.

Thursday, November 09, 2006

Hadoop on Amazon EC2

Hadoop, an open source clone of Google FS and MapReduce, can be run on top of Amazon EC2, a hosting service that allows leasing servers on an hourly basis.

The details of setting this up are available at the node "AmazonEC2" on the Lucene-Hadoop Wiki at

When looking for more about this, I noticed that the hyped-but-not-launched natural language search engine Powerset appears to be leading the charge on using Hadoop on EC2. From the Hadoop mailing list:
From: Gian Lorenzo Thione <>
Date: Fri, 25 Aug 2006 23:04:16 GMT

At Powerset we have used EC2 and Hadoop with a large number of nodes, successfully running Map/Reduce computations and HDFS. Pretty much like you describe, we use HDFS for intermediate results and caching, and periodically extract data to our local network. We are not really using S3 at the moment for persistent storage.

A nice feature of Hadoop as measured against our use of EC2 has been the capability of fluidly changing the number of instances that are part of the cluster. Our instances are set up to join the cluster and the DFS as soon as they are activated and when - for any reason - we lose those machines, the overall process doesn't suffer. We have been quite happy with this, even at significant number of instances.
That is an interesting detail on the recent announcement that Powerset is a heavy user of Amazon's EC2.

I am not sure I have an immediate use for Hadoop on EC2, but it is nice to see. Developers may now be able to rapidly bring up hundreds of servers, run a massive parallel computation on them using Hadoop's MapReduce implementation, and then shut down all the instances, all with low effort and at low cost. Very cool.

[Wiki node found via John Krystynak]

Update: Eight months later, Tom White posts a tutorial, "Running Hadoop MapReduce on Amazon EC2 and Amazon S3". [Found via Todd Huff]

Marissa Mayer at Web 2.0

Google VP Marissa Mayer just spoke at the Web 2.0 Conference and offered tidbits on what Google has learned about speed, the user experience, and user satisfaction.

Marissa started with a story about a user test they did. They asked a group of Google searchers how many search results they wanted to see. Users asked for more, more than the ten results Google normally shows. More is more, they said.

So, Marissa ran an experiment where Google increased the number of search results to thirty. Traffic and revenue from Google searchers in the experimental group dropped by 20%.

Ouch. Why? Why, when users had asked for this, did they seem to hate it?

After a bit of looking, Marissa explained that they found an uncontrolled variable. The page with 10 results took .4 seconds to generate. The page with 30 results took .9 seconds.

Half a second delay caused a 20% drop in traffic. Half a second delay killed user satisfaction.

This conclusion may be surprising -- people notice a half second delay? -- but we had a similar experience at In A/B tests, we tried delaying the page in increments of 100 milliseconds and found that even very small delays would result in substantial and costly drops in revenue.

Being fast really matters. As Marissa said in her talk, "Users really respond to speed."

Marissa went on to describe how they rolled out a new version of Google Maps that was lighter (in page size) and rendered much faster. Google Maps immediately saw substantial boosts in traffic and usage.

The lesson, Marissa said, is that speed matters. People do not like to wait. Do not make them.

Tim O'Reilly on harnessing collective intelligence

Tim O'Reilly just ran a panel here at the Web 2.0 Conference on "Disruption: Harnessing the Collective Intelligence".

Tim's introduction to the panel reminded me of his speech six months ago at UC Berkeley where he said:
A true Web 2.0 application is one that gets better the more people use it ... The real heart of Web 2.0 is harnessing collective intelligence.

[In] the world of Web 2.0 ... we share our knowledge and insights, filter the news for each other, find out obscure facts, and make each other smarter and more responsive.
At the time, after reading Tim's speech, I wrote:
I like this new definition of Web 2.0, "harnessing collective intelligence." I like the idea we are building on the expertise and information of the vast community of the Web. I like the idea that web applications should automatically learn, adapt, and improve based on needs.

I also like the idea that "Web 2.0" should include many companies that people were trying to classify as "Web 1.0"., with its customer reviews and personalized pages, clearly is harnessing the collective wisdom of Amazon shoppers. Google also is constantly improving based on the behavior of searchers.

Web 2.0 applications get better and better the more people use them. Web 2.0 applications learn from the behavior of their users. Web 2.0 applications harness collective intelligence.
The definition of Web 2.0 remains vague in most people's minds. Some describe it as a new dot com boom. Some say it is about tagging or social networks. Some say it is about fancy AJAX widgets.

I think Tim has helped clarify it with his focus on harnessing collective intelligence. If the application does not improve from the contributions and knowledge of its users, if it does not get better and better as more people use it, it is not a Web 2.0 application.

Update: In response to a question by what Tim meant by "intelligence", Tim cited Sturgeon's Law which he paraphrased as "95% of everything is crap". He said intelligence is surfacing the 5% that is not crap to the right people at the right time.

Update: Tim posted some additional thoughts.

Marten Mickos at Web 2.0

MySQL CEO Marten Mickos just gave at talk on at the Web 2.0 Conference on "The Great Database in the Sky".

The idea is that "structured data should be open sourced", linked, and easily accessible. The idea is to do something like Google does for unstructured data (web documents) for structured data (database records).

This is not a new idea. People usually talk about this as querying heterogeneous distributed databases. The trick is matching up disparate data definitions and smoothing over bad data. And that is quite a trick.

One technique is to require people to publish their data in some format that is easier to merge and process, but that requires all databases to cooperate. Another technique is to wrap databases with some translation layer, but that requires custom (and often fragile) wrappers for each database.

It's an interesting problem. I think there are good examples of doing some of this for specific domains -- metashopping searches like Shopzilla,, and, for example -- but Marten has said that MySQL will be leading a much broader push.

Wednesday, November 08, 2006

Jim Lanzone and Steve Berkowitz at Web 2.0 CEO Jim Lanzone and Microsoft SVP Steve Berkowitz had an interview with John Battelle a couple hours ago here at the Web 2.0 Conference on "Beating Google at Their Own Game".

The most interesting part to me was near the end when there was a question from the audience asking for their thoughts on personalized search.

Both Jim and Steve's answers struck me as odd. Steve entirely focused on privacy issues. He argued for giving users detailed and complete control of their data. Steve claimed this was being customer-focused, but I felt he was focusing on entirely the wrong customer. Most customers do not want to spend time twiddling configuration settings for their data; they just want to find what they need. Customer-focus for personalized search should mean helping people find and discover the information they need.

Jim also had an unusual focus, saying that "users don't customize", "users are lazy", and "the majority of people won't do it." The questioner followed up at this point, asking about implicit personalization of search, which works from behavior and requires no effort. Both Jim and Steve indicated that they thought this was a good idea, but offered nothing more.

It is surprising to me that Jim and Steve seem to have not thought much about personalized search. I was expecting to hear something deeper from them on this topic.

Personalized search is a potential way to beat Google. Paying attention to what each searcher has done allows individualized relevance ranks and should yield more relevant search results. Whoever can crack this nut could win the search war.

Update: There is some broader coverage of this talk and Ray Ozzie's talk, all put in good context, by David Needle at InternetNews.

Update: Another article, this one by Dan Farber at ZDNet, with broader coverage of this talk. [via John Battelle]

Mary Meeker at Web 2.0

Among many other things in her rapid-fire Web 2.0 talk, Mary Meeker emphasized the growing importance of personalization technology.

Mary said that "the best example of personalization is's recommendation engine" and that she expected widespread use of personalization and social filtering in the future.

Also of note was a slide that showed the top e-commerce retailers, most of which are also large offline retailers (like OfficeMax and Staples). Mary pointed out that few had predicted these retailers would appear so high in the rankings. She said traditional media companies should take notice, stop despairing over the threat of online, and focus their efforts on dominating online media.

Update: The slides (PDF) from Mary's talk are now available. [via Paul Kedrosky]

Riya, vaporware, and hard problems

Liz Gannes at GigaOm reports that Riya, a company that promised to revolutionize photo search, has "dramatically changed course". It is now going to be a visual search engine for products.

Apparently, their facial recognition technology never worked very well, so they are shifting to a problem where "the threshold of success is lower."

Riya got a lot of hype about their vaporware facial recognition technology about a year ago. For example, at the time, Michael Arrington said:
Riya leverages potent facial and text recognition technology with an intelligent interface to help people make sense of the thousands of untitled and untagged photos that are building up on their hard drives.

Riya is going to be successful. They have real technology.
It turns out, all Riya had was strong claims about solving hard problems they could not actually solve.

Of course, it is not surprising that Riya couldn't solve the facial recognition problem. It's a very hard problem. What is surprising is that they got such attention for vaporware claims about solving this very hard problem.

I wonder if we are seeing the same thing with the flood of hype and claims about solving natural language search. Powerset, for one example of many, seems to be getting amazing hype for a company with no launched product in a space littered with previous failures.

See also Don Dodge's post, "Riya tries again as".

See also Danny Sullivan's older post, "Hello Natural Language Search, My Old Over-Hyped Search Friend".

Tuesday, November 07, 2006

Eric Schmidt at Web 2.0

Eric Schmidt spoke with John Battelle on stage here at the Web 2.0 conference a few hours ago. There were some interesting new tidbits in their interview.

Eric broadly laid out two fundamental shifts that he sees in our near future: online video and software as a service. Online video is a shift toward watching videos on the Web with a focus on discovering related videos and sharing videos with friends.

Software as a service is a twist on the old idea of software moving off the disk and on to the network (a remote cluster of machines). The twist is a focus on reliability from using a large, shared cluster and on sharing and collaborating.

The prediction of an ascendance of online video might justify Google's YouTube acquisition. And, on that point, in response to a question, Eric made the incredible claim that YouTube is not violating copyright; whether objectively true or not, the fact that Google thinks it is true may also explain their willingness to pay an exorbitant $1.65B for YouTube.

Finally, a quick tidbit on hiring. In response to a question about attracting and retaining the best and brightest, Eric said, "People don't work for money. They work for impact." Very true. Money is not always the best motivator.

Update: Other good coverage from ValleyWag and PaidContent. [via Danny Sullivan]

Update: Video of this talk appears to be available.

Google and those TPS reports

Googler Chris Sacca has a fun post full of references to the movie "Office Space".

I remember this stage at As the company grew, the MBAs flooded in, and the culture started to change. As Chris says:
The leaders of Google have realized, from the earliest days, that the company's own growth would be the biggest challenge.

Nevertheless, the potential big company pitfalls are always looming. As the size grows, I see colleagues, particularly those who join Google from other companies, tempted to carve out fiefdoms and mandate SWOT analyses and extensive Excel spreadsheets littered with three letter acronyms. I have seen a few mid-level bosses evoke the traditions of Japanese management and schedule "pre-meetings" to plan, discuss, and approve what will be planned, discussed and approved at the actual meeting itself. MBA-speak creeps into the parlance and these new managers require the filing of more and more TPS reports.

The good news? Google's culture of letting engineers and product folks build great stuff that users want is still winning the day.
See also my April 2004 post, "Kill Google, Vol. 1", where I said:
The first threat to Google is internal. Google needs to maintain a culture that produces and delivers innovative new products. So far, Google has done this by hiring some of the brightest and most creative researchers in the world.

But, as Google grows, having incredible people isn't enough. Communication becomes difficult in a large organization. Accountability drops, free riding increases. Great prototypes are developed, but never get out the door. People don't know who to contact and how to get things done.

Google is well known for having nearly no management -- the controlled chaos of a research lab -- but, unless Google can adjust its organizational structure to its new size, the firm may find its innovation crushed under its own growth.
See also my May 2006 post, "First, kill all the managers", where I talk about Google's management strategy and Googler Steve Yegge's thoughts on middle management.

Friday, November 03, 2006

Doubling down at

The cover story in next week's issue of BusinessWeek, "Jeff Bezos' Risky Bet", looks at's move into web services.

Some extended excerpts:
Amazon founder and Chief Executive Jeffrey P. Bezos ... is back with yet another new idea. Many people continue to wonder if the world's largest online store will ever fulfill its original promise to revolutionize retailing.

But now Bezos is plotting another new direction for his 12-year-old company ... Judging from an advance look he gave BusinessWeek ... it's so far from Amazon's retail core that you may well wonder if he has finally slipped off the deep end.

Bezos wants Amazon to run your business, at least the messy technical and logistical parts of it, using those same technologies and operations that power his $10 billion online store. In the process, Bezos aims to transform Amazon into a kind of 21st century digital utility.

Amazon is starting to rent out just about everything it uses to run its own business, from rack space in its 10 million square feet of warehouses worldwide to spare computing capacity on its thousands of servers, data storage on its disk drives, and even some of the millions of lines of software code it has written to coordinate all that.

Bezos spent hundreds of millions of dollars to build distribution centers and computer systems in the promise that they eventually would pay off with outsize returns ... Lately profits have fallen, dragged down by spending on new technology projects.

[This] has investors restless and many analysts throwing up their hands wondering if Bezos is merely flailing around for an alternative to his retail operation.

What's more, at the same time Bezos is thinking big thoughts, Amazon's retail business faces new threats ... Other sites are fast becoming preferred first stops on the Web. Google, for one, has replaced retail sites such as Amazon as the place where many people start their shopping.

Amazon's mission to be the place where "customers can find and discover anything they might want to buy online" doesn't especially mesh with the goal to be the prime source of services needed to run an Internet Age business ... Can Bezos manage a company that simultaneously sells the most routine stuff to consumers and the most demanding business services to entrepreneurs and corporations?

"We are willing to go down a bunch of dark passageways," [Jeff] says, "and occasionally we find something that really works." As always, investing in Bezos and his company will require faith that there's light at the end of his newest tunnel.
See also my previous posts on some of's web services, "Amazon launches utility computing service", "Amazon Mechanical Turk?", and "Remote storage on Amazon S3".

See also the "Get Big Fast" post from my Early Amazon series.

[BusinessWeek article via Om Malik]

Update: BusinessWeek also has an interview with Jeff Bezos about the business of offering web services. Jeff says:
We're trying to leverage an existing asset ... We cannot operate our consumer business without these pieces of Web-scale infrastructure.

We have three businesses today ... consumer-facing ... seller-facing ... developer-facing. The first two businesses are already financially meaningful businesses for, and we believe the third one can be too.
[Found via Matt Marshall]

Update: See also the comments in a thread on an earlier post for my thoughts on Amazon's business and their web services. An excerpt:
It may be a failure of imagination on my part, I have a hard time seeing Amazon web services as a profitable venture. I understand the argument that Amazon needs to build a lot of this anyway for internal use, but I think there is a big difference between supporting and servicing internal customers and external customers. I suspect Amazon will find it very expensive to do everything that enterprise-level paying customers will expect of them.

In addition, I find myself confused trying to determine the target market for these web services. Most web services nowadays are free and targeted at hobbyists. Amazon is hoping to extract hundreds of millions from their web services, which means they would need thousands of enterprise level clients. Does this market exist? Other utility computing offerings do not seem to have tapped it, perhaps because substitutes (e.g. leasing servers) work as well or better for many purposes.
Update: Tim O'Reilly, in an interview with Jeff Bezos at Web 2.0, beautifully summarizes the Wall Street reaction to all of this: "But, why?"

Thursday, November 02, 2006

A9 goes after advertising

Amazon's A9 appears to be refocusing on web advertising instead of web search. First, they simplified and scaled down their web search effort. Now, they are testing a new web advertising platform called Clickriver.

Clickriver appears to be AdWords for Amazon -- web ads placed on -- and a first step toward an AdSense competitor. From the Clickriver About page:
Clickriver Ads is an advertising program ... that allows businesses to place sponsored links on

Clickriver Ads will appear on, on search results pages and on many product detail pages.

With Clickriver, a wide range of complementary products and services can be advertised at the precise moment that someone is interested -- as they shop, browse, and search on For example, ads from banks can display on pages for finance and investment books, such as on the "Road to Wealth" product page. Ads from photo printing services can display next to search results for cameras and tripods. Ads for hotels, car rentals and travel agencies can display on pages for travel books, sunglasses, suitcases, portable DVD players and other travel accessories sold on

Clickriver Ads was built by, a search technologies company based in Palo Alto, California. is a wholly owned subsidiary of
Back in April, in a post called, "What will become of A9?", I said:
A9 could switch to focusing on AdWords and AdSense-like advertising, building on the success of Amazon Associates.

A9 would compete with Google not on web search, where Google has a strong advantage, but on advertising, where Amazon has its massive catalog and experience selling products to leverage.
Looks like that is exactly what A9 is doing.

[Found via Gary Price, John Battelle, and John Cook]

Update: Om Malik says, "Amazon doesn't really want to be a retailer some days, does it?"