Friday, March 28, 2008

Talk on disk as the new RAM

Northeastern Professor Gene Cooperman recently gave a curious Google engEdu tech talk, "Disk-Based Parallel Computation, Rubik's Cube, and Checkpointing".

Gene's starting point is that "disk is the new RAM" and the "disks of a cluster can serve as if they were RAM" because the bandwidth to 50 disks is 5G/second, same as the bandwidth to RAM.

The talk just gets more fun from there, with Gene claiming that "a compute cluster with 32 quad core nodes, each with 500G of local disk, is a good approximation of ... a single computer with 10 terabytes of RAM and 200 CPU cores."

The premise is, of course, outlandish. The obvious issue to come up is that the latency characteristics of 10T of RAM is totally different than the latency characteristics of 32 500G disks.

But, as long as long as we can batch the reads and writes to the disk, this difference does not matter. Gene gives a few classes of algorithms -- breadth first state-space search, some algorithms that involve millions of accesses to hash tables, some types of pointer chasing -- that they have found amenable to the model.

This has parallels to MapReduce and the changes we need to do to algorithms to make them work well in a MapReduce framework, as one Googler pointed out during the Q&A.

If you only have time for a few minutes and want the gist of the talk, I would recommend you at least watch from 31:48 for 5-10 minutes.

2 comments:

Amit said...

With the increasing penalty for cache misses, RAM is the new disk, and disk is the new tape. And Flash memory is I have no idea what ;)

Filip said...

I don't think the point of disk becoming new RAM is valid in any case but a really specific one (the one mentioned in your article).

There's not just cache misses, there are other problems. Disk is the main component in current-days computers that hasn't kept up with speed increases.
Sure, it may be possible to improve on that with parallellism, but that introduces other problems — more points of failure, heat issues, increase power requirements, etc. Not to mention the physical size requirements.