
Current Intel prototype chips the size of a fingernail wield the power equivalent to 2000 square feet of computer equipment and 10,000 chips 11 years ago.
This week, Intel introduced a prototype chip with 80 cores, capable of more than a trillion calculations per second (teraflop).
“It’s not too difficult to find two or four independent things you can do concurrently, finding 80 or more things is more difficult, especially for desktop applications. It is going to require quite a revolution in software programming. ” – Dr Mark Bull, at the Edinburgh Parallel Computing Centre

Within scientific computing, specifically life sciences, I can think of several codes I’d like to try on a system like this: Smith-Waterman, BLAST, CLUSTALW. While traditional supercomputing codes (NASTRAN, CFD, Weather modeling) would benefit from tightly integrated interconnects made possible by a single-die supercomputer, most life science codes we run are less tightly interdependent and often bound by memory bandwidth.
Speeds and feeds will still be important, perhaps even more so now the processing is centered on a single die. In the Intel press piece there is some talk about ‘network on a chip’ routing of message passing between cores, but little talk about the memory architecture. Tera-scale bandwidth to memory is suggested by exploring 3-D stacked memory, directly connected to the processor. Each processor would have a local and a shared memory cache: NUMA, ccNUMA — or some other variation?
I’ll be interested to get my hands on these in the coming years.
Check out the ‘Tera-scale Architectural Vision’ FLASH presentation on the Intel site. (link)
New York Times: (article)