Zibb

Ron WilsonEDN Executive Editor Ron Wilson explores how IC design teams really work: the struggle for power efficiency and performance, wrestling with semiconductor processes and design methodologies, the challenges of global design teams. How do we somehow herd architecture, IP, design and verification into a successful tape-out?



   Advertisement

Profile

RSS Feed

  • Add this blog to your RSS newsreader!

Recent Posts

Recent Comments

Most Commented On

Archives

By Category

Blog

Tuesday, August 26, 2008

Heard at Hot Chips: 20 years of what worked and what didn't in CPU architecture

Aug 26 2008 11:58AM | Permalink |Comments (9) |


You might imagine that somewhere in a back room at conferences the old hands at microprocessor architecture get together over dinner and a few bottles of wine--the sort that must be concealed in expense reports--lean back in their chairs, and talk long into the night about lessons learned and lessons repeated. Last night the minority of Hot Chips attendees who stayed around after dinner had the privilege of listening in on just such a discussion, staged as a panel.

And an august group is it was. Chaired by Nick Tredennick, who has been in microprocessor design since the Motorola 68000 development, the panel included Insight 64 analyst Nathan Brookwood, Intel vice president and low-power processor pioneer Dave Ditzel, Techvision principal and MIPS pioneer John Mashey, the legendary Berkeley professor and RISC champion David Patterson, former Intergraph guru Howard Sachs of Telarity, and Microprocessor Report founder Michael Slater. As Tredennick said in his opening remarks, "these men need no introduction—if you don't recognize them, just ask your parents."

Setting the tone, Tredennick characterized the industry has having been "fooled by randomness." Architecture is not a science, he argued, because it has only self-validation. "An architect creates a new architecture, and then we let him tell us about all its advantages and conceal all its problems. It's like trying to form an opinion of kids by asking their parents. We should be asking the neighbors."

The tone of the panel remained similarly light. But inevitably, when people of this caliber get together, some profound thoughts precipitated out of the levity. And given the diversity of experience on the panel, their observations were remarkably consistent. Taken together, they could almost form a little handbook of how to, and not to, do a CPU architecture.

Perhaps the first point on which many of the panelists remarked was the inverse correlation--or perhaps, after anti-x86 bias is removed, lack of correlation--between architectural elegance and market success in the processor world. (Given their backgrounds, the panelists focused on the PC and server processing world, not the embedded space.) Hardly anyone would consider the x86 instruction set architecture to be pleasant, let alone elegant, but it has utterly dominated the market. In fact one of the lessons many of the panelists cited was a simple truth: don't go up against x86.

Another key point was the persistence of software. Noting that C, now an ancient language, had defeated all attempts to supplant it, and that some applications still used FORTRAN, Mashey observed "Chips come and go, but software is forever." To apply this aphorism, he pointed out that a new instruction set architecture requires someone to write new operating software, and that is extremely hard and slow. "A really cool chip can mean there is no software for it," Mashey said. Special mention in the category of creating an impossible software problem was reserved for IBM's Cell processor, which, Ditzel pointed out, has not one but two proprietary instruction sets on one chip.

Another point made by the panelists is that, on the whole, the pursuit of instruction-level parallelism had proved frustrating. Nearly a decade ago academic researchers warned that the best average number of instructions per clock on a broad mix of codes might be below 4, no matter how clever the compiler technology. Ignoring this, CPU architects pushed for ever more aggressive hardware, including wide instruction buffers, out-of-order execution, branch prediction, and even speculative execution, to make the machines capable of executing more and more instructions in parallel. The ultimate expression of this trend was the long infatuation in some circles with VLIW (very long instruction word) machines, in which the compiler bore almost all the responsibility for recognizing and preparing instructions for parallel execution, and parallel execution pipe numbers reached toward a dozen. But in the end, the academics proved to have been correct. "I think we have to call VLIW and superscalar approaches near misses," Patterson said.

In contrast, Patterson pointed out—and others agreed—that one concept that had caught on was SIMD (single-instruction, multiple-data) instructions for exploiting parallelism that lies in the data, rather than in the instruction stream. Ditzel even suggested that vector processors, an elaboration on the SIMD concept, might see a resurgence of interest as architects struggle to exploit data parallelism while minimizing energy consumption.

The question of power—whether power density for thermal reasons or total energy consumption for battery-life reasons—also marked a number of comments during the discussion. Ditzel perhaps put it most succinctly: "The quest for low power is driving a return to simplicity in processor architecture." He pointed out, for example, that the just-announced Intel Larrabee processor uses a simple, short execution pipeline based on a 1992 Pentium design—for power reasons.

Other panelists added to this point. Patterson cited superpipelining, in which some architects claimed performance advantages for pipelines as long as 50 stages, as one of the conspicuous failures in the search for performance. He observed that lately, pipelines have been getting shorter, not longer. Brookwood said "performance/Watt considerations will bring us back to using simple, in-order execution pipelines. We just can't afford the power necessary to run all that extra hardware to manage out-of-order execution, and to speculatively execute instructions that we will subsequently throw away."

Instead, the panelists saw a return to simplicity throughout the architectural world. The old debate between architects who relied on complexity to execute ever more instructions per clock and those who relied on simplicity to increase clock frequency seems to have finally been resolved, in favor of simplicity. In the future, it was suggested, we will use our growing transistor budgets to build lots of fast, simple machines with aggressive, smart power management, rather than to build highly complex machines. Moore's Law, which in the past conquered bipolar ECL, GaAs, and SiGe, appears to have slain architectural complexity as well.


Related entries in: Microprocessors | SOC (System on a chip) | 


Reader Comments



at 8/27/2008 2:05:20 PM, stiggle said:
Massively parallel, it is amazing what the human brain can do with less than 25W of power consumption...



at 8/27/2008 11:04:36 PM, Peter Glaskowsky said:
Especially amazing when it's running at only a few hundred cycles per second.

On the other hand, we're lucky to maintain a few bits of precision anywhere, we save only a tiny fraction of our computational results to anything resembling long-term storage, and we have NOTHING resembling a backup strategy.



at 8/28/2008 10:18:50 AM, Jes said:
stiggle: what, or more accurately, how, the human brain does what it does has very little to do with what or how microprocessors do what they do. Specifically, the human brain (and its sensory organs like the eyes, ears, etc.) discard most of what they receive by pattern matching to a vary narrow target. This occurs at enough levels that the actual amount of information hitting our conscious brain is only a faint whisper of what hits are eyes or ears in raw EM or acoustic energy information content (a la Shannon). In short, the brain uses heuristics akin to cheap parlor tricks to make you think it''s doing a lot of sophisticated things but in most cases it regurgitating whatever you learned or experienced from the past based on replication of patterns triggered by what you are experiencing at the moment.

The persistance of software can be attributed to the fact that while electronics and computers can almost directly pass-through the scaling value afforded by Moore''s Law, software creation doesn''t scale well at all being bounded linear or logarithmic at best to Moore''s Law''s exponential.

The complexity of both software and hardware shouldn''t be surprising though it will take us to biological designs which will carry the frustration that neural nets often do today: it work but we can''t analytically explain specific cases of why rather only general ones. The moves in robotics toward "embodied computing" which mimic the types of "dumb processing that becomes emergent intelligence" have shown the true path. Similar "dumbing down" through embodied design will be the only path to achieving completely opaque and long-sought Turing Test positives for intelligence.

We will, however, find that these system have some of the same idiosyncrasies we hate in ourselves including distraction, not understanding models that can''t be represented in embodied metaphors, etc.



at 8/29/2008 4:10:56 PM, stiggle said:
Thanks Jes. From my above statement it must be quite obvious that I'm stupid and know very little about anything...



at 8/29/2008 5:45:44 PM, Bluebear said:
In retrospective the desktop and entry-level server leadership environment was that of marketocrat and not technocrat. The answer to the question why CPU architecture elegance did not lead to commercial success is self-evident in this blog. For example, Patterson’s reference to the market success of SIMD implies the growth of Harvard-type DSPs which manipulate digital samples in time domain to achieve results in frequency domain by multiplying a filter kernel word with a data word using two independent data streams feeding one instruction stream. That has really nothing to do with Cray’s kind of vector processing that manipulates matrices to solve many real live problems in parametric models by solving linear algebra representations of large sets of equations of multiple unknowns. Ditzel made a wrong generalization from SIMD to Vector. In a lab, the “smart” architects would have not much of a career choice but to listen to Ditzel because of his political clout. Accept for sure-differences like the von Neumann SISD vs. the Harvard SIMD, most of the other caching, piling, speculative prediction braches, have really no way to prove their advantages over alternate designs so the technically uninformed marketer bosses have been safe to be the ones who made the calls. Going up the hardware levels, from rear-end binary instruction set, front-end meta instruction set, to grammar lexicon and token rules that stuffs like yacc and lex connects to high level language such as C, there are huge investments in the tooling software industry which acts both as an entry barrier and as stabilizer against hardware design whims. At the same time, the materials science guys have done great job of continuing reducing Si features to improve raw speeds of the gates so that CISC or RISC or other inefficiencies in the middle architecture ad hoc hardware models have been masked away at incredible speed. In prospective, however, a new trend seems emerging. The market is getting mature so marketocrat is being replaced by financiocrat who, for cash flow survival reasons, favors the bottom line (profit) over top line (gross sales) and we should see more turns in architecture towards, neither technical elegance nor established popularity, but any contrivances that reduce the product costs.



at 9/2/2008 7:51:46 AM, Chris PE said:
It has been 25 years from a solid applications of CPUs , not counting experimantal computers that had integrated CPUs in it years earlier.Interesting article and quite a choice of comments.I admire people for their knowledge , but my general feeling is that I am like Stiggle. I find wmyself dumb and still with TONS to learn after my 35 years with 2 masters in bioelectronics and electronics.By the way - interesting look on brain Jes...



at 9/2/2008 10:51:03 AM, Stiggle said:
Another interesting aspect is asynchronous processing..



at 9/4/2008 11:11:49 PM, Chuck said:
Let''s hear it for gnats! They have brain smaller than the head of a pin and yet they can see and avoid threats, fly, walk, find food, find a mate, etc. How many computers can do all that?



at 9/5/2008 11:12:35 AM, Stiggle said:
I agree Chuck!
Now that's what I call micro power processing...(Parallel, Asynchronous, and low power.)

Post a comment



Display Name

Change Image
Before submitting this form, please type the characters displayed above.
Note the letters are NOT case sensitive.


ADVERTISEMENT

©1997-2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites