The Hadoop open source project is building a clone of the powerful Google cluster tools Google File System and MapReduce.
I was curious to see how much Yahoo appears to be involved in Hadoop. Doug Cutting, the primary developer of Lucene, Nutch, and Hadoop, is now working for Yahoo but, at the time, that hiring was described as supporting an independent open source project.
Digging further, it seems Yahoo's role is more complicated. Browsing through the Hadoop developers mailing list, I can see that more than a dozen people from Yahoo appear to be involved in Hadoop.
In some cases, the involvement is deep. One of the Yahoo developers, Konstantin Shvachko, produced a detailed requirement document for Hadoop. The document appears to lay out what Yahoo needs from Hadoop, including such tidbits as handling 10k+ nodes, 100k simultaneous clients, and 10 petabytes in a cluster.
Also noteworthy is Eric Baldeschwieler, a director of software development at Yahoo, who recently talked about direct support from Yahoo for Hadoop. Eric said, "How we are going to establish a testing / validation regime that will support innovation ... We'll be happy to help staff / fund such a testing policy."
There is nothing wrong with this, of course. If anything, it should be viewed as noble that Yahoo is supporting an open source version of these powerful tools and making them available to all.
But it is interesting. It is interesting that Yahoo is so involved in building a Google FS and MapReduce clone. It is interesting that Yahoo would choose to open source these tools. It is interesting to see this level of involvement from Yahoo in Hadoop.