Saturday, June 10, 2006

GFS, MapReduce, and Hadoop

If you are interested in Google's GFS and MapReduce, don't miss Hadoop, a GFS and MapReduce clone being developed by the same folks who wrote the open source text search engines Lucene and Nutch.

Update: Doug Cutting gave a talk, "Scalable Computing with MapReduce" (PDF) at OSCON 2005 that discusses Hadoop (called NDFS in the talk) and how it helped Nutch scale its crawl to billions of pages. [via Sergey Chernyshev]


Anonymous said...

You know, the day might come when Google regrets being so open about its technology. This is one side-effect of aggressively recruiting academic researchers who expect to be able to "publish" their results, despite working for a publicly-held company with valuable intellectual property and trade secrets.

In the meantime, I'm happy to consume their learnings for free.

Greg Linden said...

Maybe. Transparency has a lot of benefits.

I suspect Google would argue that any downside from potential competitive issues is outweighed by the upside to innovation and recruiting.

All the brains don't live at Google. By engaging the research community, people elsewhere will collaborate with Googlers, work and publish on problems relevant to Google, and are more likely to eventually join Google.