Google Fellow Jeff Dean gave a talk at Google I/O called "Underneath the Covers at Google: Current Systems and Future Directions". Slides (PDF) also are available.
I was going to post some detailed notes on the talk, but James Hamilton's excellent post on the talk already covers most of what I was going to say.
Adding to James' thoughts, let me emphasize two parts of the slides that, even if you have seen this stuff many times before, definitely are worth a peek.
First, Jeff's descriptions of real failures they encountered on slide 12 are excellent. Note that randomly distributing replicas is not enough; you have to make sure all your replicas never are located in the same rack.
Second, slide 37 is on "Future Infrastructure Directions" for Google. Jeff emphasizes the fascinating problem of automated movement and replication of data and code in response to load across clusters and data centers. Very hard but very fun optimization problem there.
All the other Google I/O talks are also online if you are interested.
[Thanks, Dragos, for the pointer to the Google I/O talks.]