Northeastern Professor Gene Cooperman recently gave a curious Google engEdu tech talk, "Disk-Based Parallel Computation, Rubik's Cube, and Checkpointing".
Gene's starting point is that "disk is the new RAM" and the "disks of a cluster can serve as if they were RAM" because the bandwidth to 50 disks is 5G/second, same as the bandwidth to RAM.
The talk just gets more fun from there, with Gene claiming that "a compute cluster with 32 quad core nodes, each with 500G of local disk, is a good approximation of ... a single computer with 10 terabytes of RAM and 200 CPU cores."
The premise is, of course, outlandish. The obvious issue to come up is that the latency characteristics of 10T of RAM is totally different than the latency characteristics of 32 500G disks.
But, as long as long as we can batch the reads and writes to the disk, this difference does not matter. Gene gives a few classes of algorithms -- breadth first state-space search, some algorithms that involve millions of accesses to hash tables, some types of pointer chasing -- that they have found amenable to the model.
This has parallels to MapReduce and the changes we need to do to algorithms to make them work well in a MapReduce framework, as one Googler pointed out during the Q&A.
If you only have time for a few minutes and want the gist of the talk, I would recommend you at least watch from 31:48 for 5-10 minutes.