Google Principal Engineer Luiz Andre Barroso wrote an ACM article called "The Price of Perfomance" where he discusses issues with power consumption in the Google cluster.
Don't miss the graphs on the first page showing that performance per watt has been relatively flat and that the cost of power over the lifetime of the hardware may soon exceed the cost of the hardware.
Luiz also makes an interesting point about slow development of CMP (chip multiprocessor) commodity hardware, saying, "Desktop volume still largely subsidizes the enormous cost of server CPU development and fabrication, the lack of threads in the desktop has made CMPs less universally compelling."
That is a problem with building massive server clusters on commodity hardware. The hardware is cheap, but it is designed to solve a different problem, powering a box in a desktop environment.
On a related note, notebook sales exceeded desktop sales for the first time in 2005. If this trend continues, the bigger market of mobile processors may be the driving force in CPU development in the future.
It will be interesting to see if Google's switches to using mobile processors for their cluster. Notebooks prioritize power consumption, which is an issue they share with Google's massive cluster.
And I do wonder how much Google would actually benefit from multiprocessor hardware. Luiz seems to suggest that they would, but I would think that the cluster is mainly bound by disk I/O. That would mean the goal is to keep as much data in memory as possible across the cluster, so additional processing per node would have less value than additional RAM.
If so, the surprising conclusion might be that switching to slower, low power mobile processors may actually increase overall throughput if it allowed more nodes and more data to be held in memory across the cluster.
Update: Seven months later, a paper on Google Bigtable mentions that Google is using machines with two dual-core Opteron 2 GHz chips (almost certainly the low power HE chips). Google seems to be putting a lot of processing in each node, more than I would have expected. The memory per node was not disclosed.