Wednesday, February 21, 2007

The rise of wildly speculative execution

I enjoyed Tim O'Reilly's post with thoughts from Tim and others on the changes we may see with increased hardware parallelization.

Like them, I was struck by the news of an 80 processor chip prototype from Intel and wondered what changes it might cause.

However, as I think about this, I suspect the changes we will see will go well beyond increased use of threaded programming to parallelize one task or data mining frameworks like MapReduce.

So, if you do not mind, please indulge me while I go all wacky visionary with this post.

With hundreds of processors available on the desktop, I think we will be moving toward a model of wildly speculative execution. Processors will be used to do work that may be necessary soon rather than work that is known to be necessary now.

Modern, single core, pipelined processors already do this to a very limited extent. Speculative execution sends a processor down the the most likely path of a conditional branch, executing a few cycles of machine instructions that may have to be thrown away if the branch prediction was incorrect.

What I think we may see is a radically expanded version of speculative execution, running code seconds or minutes ahead of when it may be needed. Most of this work will be thrown away, but some will be available later just when it is needed.

It is easier to imagine how this might work for some tasks than others. For example, I could imagine a speech recognition engine running on your desktop that simultaneously runs hundreds of models analyzing and voting on what you have said and what you are about to say. I could imagine a computer immune system that was using a fraction of the processors to search for anomalous patterns in the usage of the rest of the hardware, growing in size as potential threats are detected, shrinking away as the threat passes.

I think our model for programming for many tasks may move from one of controlled, orderly execution of code to one of letting loose many competing predictive models executing in parallel.

In that sense, I think Larry Page was right when he said, "My prediction is that when AI happens, it's going to be a lot of computation, and not so much clever blackboard/whiteboard kind of stuff, clever algorithms. But just a lot of computation." Larry then went on to compare this vision of computer AI to how the brain works.

The brain is a giant pattern matching, prediction engine. On its 100,000,000,000 processors, it speculatively matches patterns, creates expectations for the future, competes potential outcomes against each other, and finds consensus. These predictions are matched against reality, then adapted, improved, and modified.

With a few hundred processors, we are a long way from the parallel processing abilities of the human brain. Yet, as we look for uses for the processing power we soon will have available, I suspect the programs we see will start to look more like the messy execution of the prediction engine in our head than the comfortable, controlled, sequential execution of the past.

Update: Nine months later, Andrew Chien (Director of Intel Research) says something similar in an interview with MIT Technology Review:
Terascale computing ... [is] going to power unbelievable applications ... in terms of inference. The ability for devices to understand the world around them and what their human owners care about is very exciting.

In order to figure out what you're doing, the computing system needs to be reading data from sensor feeds, doing analysis, and computing all the time. This takes multiple processors running complex algorithms simultaneously.

The machine-learning algorithms being used for inference are based on rich statistical analysis of how different sensor readings are correlated, and they tease out obscure connections.
[Chien interview found via Nick Carr]


Meme chose said...


The truth is, we're already there. The difference between a search engine and for example metasearch is that the search engine has already asked your question (via its web crawl) and pre-formatted its answer before you thought to ask it.

Aron said...

Most cpu's are currently idle. The idea of putting them back to work could theoretically be applied right now. You are forecasting a software innovation.

I nevertheless agree with the direction. We're still heavily bound by these kludgy memory subsystems and the combinatorial explosion of prediction is a problem.

How do multiple processors in your vision differ from a single cpu merely sped up by 80x?

I still personally want my computer to be monitoring me and preparing appropriate wiki and google footnotes for my perusal on demand. (fed into my spiffy robo eyes of course) ;)

Greg Linden said...

How do multiple processors in your vision differ from a single cpu merely sped up by 80x?

The difference is more with what cannot run on 80 separate processors than what can.

A non-parallelized program will run x80 slower on 80 separate processors than on a single CPU sped up by x80. Many of today's programs can be difficult to parallelize effectively.

Others types of software approaches -- computing things that may not be used and running many competing predictive models against each other -- become attractive only when there is lots of excess and otherwise idle processing power available.

Anonymous said...

How about perhaps saving some power and not precomputing. How about being a little kinder to the environment or at least your power (and related: cooling) bill?

I'm perhaps in a small minority, but given how incredibly wasteful modern software stacks tend to be.. I'd like to see a return to systems that just do what they need to do. Reminds of the joke about the EE guy and the CS guy asked to design a toaster..

Aron said...

I think you missed my point, Greg. Your vision is saying, look we have parallel hardware emerging so let's analogize to the brain and running multiple predictive branches. However, the actual parallelization of hardware is no more efficient at accomplishing that then the mere extrapolation of moore's law on a single processor. Or in sum, if we were going to head down the path of parallelism at the software level before, that doesn't change with the introduction of an 80-chip cpu.

I still agree we're headed down that path eventually but the software innovations don't appear to be coming quickly. I think the pre-indexing of desktop search as another commenter mentioned is a good example perhaps.

I don't think there's enough real-world data piping into the system to support software of this nature at this point. The system has to have more contextual awareness of its user.

I think Page is way off. It may be true that strong AI simply requires inordinately more computational power than researchers traditionally forecasted. However, I think at some point in time, it will be insights into how our grey goo performs optimizations that gets us to the end goal faster than the underlying hardware acceleration. How he draws the conclusion that our brain doesn't have complicated algorithms because it fits into 600MB is beyond me. I suspect the information density of DNA is much higher than your standard program (OS or otherwise).

Quartz said...

I do fully agree with Aron.
And it's funny that so many people keep assuming that parallelism is a solution per se just because our brain is parallel; painting CPUs grey would follow the very same logic.
We need evolution in software first, not in hardware, and that means changing our minds first instead of waiting for the Holy Grail magically solving problems. And for that we should begin understanding something of our brain, but we're still quite far... (no, all that computer vision etc stuff doesnt get any closer than chess playing).
Speculative execution can be useful in making interaction more fun (ah no, they call that "responsive and empowering"), but what happened to the old rule GIGO? So in the end I will only really advocate it once we get serious software in the first place, and for now also I stay in the "minority" avoiding resource wastes, be it keeping CPUs cool or occupied with more urgent stuff (btw for HPC and efficiency check