Monday, July 16, 2007

HBase: A Google Bigtable clone

HBase appears to be a very early stage open source project to clone Google Bigtable.

From the project page:
Google's Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.

Just as Bigtable leverages the distributed data storage provided by the Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop.

Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key).
Note that two of the four early contributors -- Jim Kellerman and Michael Stack -- are from Powerset. Some members of the Powerset team also have been trying to run Hadoop on Amazon EC2.

See also my earlier posts, "Google Bigtable paper" and "Google's Bigtable".

See also my earlier posts, "Yahoo building a Google FS clone?" and "GFS, MapReduce, and Hadoop".

Update: Looking at the HBase source tree, there appear to be other contributors who are not listed on the project page. Not surprising to see that Mike Cafarella and Doug Cutting were deeply involved.

3 comments:

Andrew Hitchcock said...

After you first mentioned Hadoop, I began following the project. I was really excited when HBase was started, it seems to be coming along quickly.

Also, I ran into Michael Stack at the recent Google Scalability Conference, but didn't have a chance to say hi.

burtonator said...

Yeah.... I've been meaning on playing with it.

It's hard to justify when a lot of others are scaling MySQL fairly well and you can hire developers that already grok this code.

Sharding just works (Facebook, Adwords, Livejournal, etc).

Not that I wouldnt' kill for a good Bigtable implementation.

Unknown said...

I continue to not be excited about this HBase. Why? Its Java. I just don't find deploying Java to be all that much fun on Linux.

Give me C/C++, Python, Perl, PHP... just about anything but Java.