Saturday, June 10, 2006

Talk on large systems at Google

The audio and video quality is horrible, but this Google talk, "Building Large Systems at Google", by Googler Narayanan Shivakumar (aka Shiva) has some interesting tidbits about GFS, MapReduce, BigTable, and Google's infrastructure.

Shiva starting by talking a bit about the Google Kirkland office and some of the projects launched out of that office.

Around 13:00, after talking a bit about GFS, some questions from the audience pick at the vulnerability of the system to failures of the master nodes. Interesting point there.

Around 26:00, Shiva talks about the motivation behind BigTable. Shiva says they tried commercial databases and found them insufficient for their needs. The system they built, BigTable, is a giant, persistent map (huge database of key->value pairs) running across a large cluster.

Around 40:00, Shiva makes the interesting statement that, in addition to lower cost for higher performance, a benefit of building a cluster on unreliable hardware is that it keeps programmers from "being lazy." He didn't go into this in depth, but I assume the argument here is that more expensive, high availability hardware fails less often, but still fails sometime, so it is better to just accept that hardware will fail and design your systems to survive failures. This advice conflicts somewhat with what Shiva said earlier in the talk when he claimed that a good way to work around the issue of the vulnerability of GFS master nodes is to use higher availability hardware.

Around 43:00, Shiva touched on the point that Google is most interested in the performance per watt characteristics of its hardware since that seems to be the primary constraint in their data centers. For more on that, see my earlier post, "Power, performance, and Google".

On a personal note, the camera never turned to the audience, but I swear that was UW Professor Ed Lazowska who asked several questions during the talk. Either that, or someone who could be his voice double.

No comments: