Google Fellow Jeff Dean gave a fun talk last week at University of Washington Computer Science titled "Research Challenges Inspired by Large-Scale Computing at Google".
The talk is a "collection of problems we think are interesting/difficult" and, since it is coming from Jeff, has a heavy bias toward infrastructure problems.
The talk starts with energy efficiency in large scale clusters. Jeff pointed out that most work on power optimization is on laptops, not servers, but servers in a cluster have drastically different power optimization needs. In particular, laptops optimize power by shutting down completely, but servers are often at 20%-30% utilization on CPU, memory, and disk, and it would be nice to have them use only 20-30% power in this state rather than about 80%.
At this point, I was wondering why they didn't just shut down part of their cluster to get the utilization of the remaining servers closer to 80% or so. Ed Lazowska apparently was wondering the same thing since, moments later, he asked why can't Google just use smarter scheduling to compress the workload in the cluster (and, presumably, then put the now idle part into a low power mode). Jeff said that didn't work because it would impact responsiveness due to locality issues. Jeff's answer was vague and I am still somewhat unclear on what Jeff meant, but, thinking about it, I suspect what he wants is to use all the memory across all the boxes in the entire cluster, have a box respond immediately, but still use a lot less power when executing no-ops than the 50% of peak an idle box currently uses. So, keeping all the memory on the cluster immediately accessible so we can maximize how much data we can keep in memory seems like it is a big part of what makes this a challenging problem.
Next, Jeff talked about the OS. He pointed out that the "design point of the original version of Linux is pretty far removed from [our] very large data centers" and wondered if an operating system would be designed differently if it was specifically made for running in a cluster of 10k+ machines. He gave a few examples such as not really needing paging to disk but maybe wanting remote paging to the memory of other machines, adapting the network stack to microsecond network distances between machines, and changing the security model to focus on isolating apps running on the same box to guarantee performance.
Moving up a level again, Jeff described wanting a consistent framework for thinking about and building distributed applications and the consistency of the data for distributed applications.
Up one more level, Jeff wanted people to start thinking of having very large scale systems of 10M machines split into 1k different locations and how these would deal with consistency, availability, latency, failure modes, and adaptively minimizing costs (especially power costs).
Finally, Jeff briefly mentioned very large scale information extraction, speech processing, image and video processing, and machine learning, mostly talking about scale, but also giving a few examples such as moving beyond N-grams to handle non-local dependencies between words and Google's efforts to understand the semi-structured data in tables in web pages and data hidden behind forms on the Web.
Coming away from the talk, the biggest points for me were the considerable interest in reducing costs (especially reducing power costs), the suggestion that the Google cluster may eventually contain 10M machines at 1k locations, and the call to action for researchers on distributed systems and databases to think orders of magnitude bigger than they often are, not about running on hundreds of machines in one location, but hundreds of thousands of machines across many locations.
The talk is available for download in a variety of formats. Light, enjoyable, and worth watching if you are interested in large scale computing.