Leibson's Law: It takes 10 years for any disruptive technology to become pervasive in the design community. This blog is about the disruptive technologies that either have or will win over electronic engineers, some that won't, and why. Written by Steve Leibson, Tensilica's Technology Evangelist. See my history site at www.hp9825.com. You can email me by taking the first letter of my first name, appending that to my last name, then the magic email symbol, followed by the name of the company I work for, and then a dot followed by com.
Jul 2 2008 4:16PM | Permalink | Email this | Comments (0) |
Blog This! using: Blogger.com | LiveJournal |
Digg This | Slashdot This | add to Del.icio.us
I’ve just returned from Maastricht, Netherlands where I attended the 2008 edition of the MPSOC (Multi or Many Processor SOC). This is entry #11, my last MPSOC ’08 entry. It’s about my own presentation at the conference, which was based on a longer presentation created by Tensilica’s founder Chris Rowen.
Direct energy use for all information devices (PCs, telephony equipment, consumer electronics, i.e. the electronic stuff we design) consumes 6% of the world’s generated electricity. That’s 200 trillion Watt-hours per year. Generating that electricity also dumps 150 million tons of CO2 into our atmosphere each year, which is about the same CO2 output of 30 million automobiles. So there is a very real reason, besides reducing operating costs, to want to create more energy-efficient system designs.
For the past 40 years, system designers have hopped aboard Moore’s Law and ridden the train to higher performance and lower power. However, we only thought we were riding the Moore’s Law train. In reality, we were riding the Denard Scaling train. Moore’s Law gave us more transistors per unit area of silicon. Denard Scaling gave us lower power and faster transistors. The Denard Scaling train ran off the rails at the 90nm process node. We still get more transistors from each IC process node from Moore’s Law but they don’t get that much faster and the dymanic power doesn’t drop all that much. Thanks to dropping device thresholds, the tiny nanometer transistors leak like crazy, and this increases static leakage and static power dissipation. That’s the new physical reality of IC-centric system design.
The old calculus of system design said that to get twice the performance you simply clocked the processor twice as fast and used faster memory. To do that, you might have to tweak the core operating voltage and device thresholds to get the clock rate up and keep power and energy consumption somewhat in check but you were probably going to need a bigger heat sink. Relatively few people remember when the standard operating voltage for all logic was 5V. Now it’s down around a volt or less because of energy concerns. It’s been dropping for years as an effective way to limit power.
Thanks to the derailment of Denard Scaling, there’s a new calculus of system design. If one processor can do X amount of work for Y amount of power, two processors running at half the clock frequency and half the core operating voltage will deliver X amount of work at a quarter of the power. The cost is double the number of transistors, which keeps area use constant from one IC node to the next thanks to Moore’s Law. In the old system-design calculus, you’d be crazy to double the number of transistors because simply doubling the clock frequency was easier. With Denard Scaling dead but Moore’s Law intact, it makes a lot of sense to double the number of transistors but cut 75% of the operating power.
Which leads to this formula for energy efficiency success:
Many small processor cores plus processor instruction optimization (so the processor can clock as slowly as possible) equals an energy breakthrough.
This breakthrough extends battery life, simplifies both IC and end-product packaging by reducing cooling requirements, reduces product manufacturing and operating costs, and lowers the environmental impact of making and using the product.
Let’s look at the idea of using optimized instructions a bit more closely. One optimized instruction can replace from 5 to 50 general-purpose processor instructions. Looked at one way, if you’re replacing the instructions in an inner loop that determines the maximum required clock rate for your processor, optimized instructions can cut that maximum required clock rate from 5x to 50x for a big savings in energy consumption.
You also need to take a close look at the way you interconnect processors inside of the MPSOC. The knee-jerk approach based on the last 40 years of system design is to connect all of the processors to a global bus or to a bus hierarchy. This is just wrong. WRONG! That approach is based on the way we designed systems for printed-circuit boards that had limited interconnect capacity. I asked the audience at MPSOC ’08, “If you’ve already decided to meet your performance requirements by distributing the work over a number of processors, why would you then choose a bottleneck like a global bus or bus hierarchy to move data and instructions to and from these processors?” There are much better interconnection schemes for on-chip processors that exploit the massive interconnect routability on nanometer silicon. Such methods include point-to-point connections with or without dual-ported RAM or FIFO queue buffers and on-chip networks (NoCs).
I concluded my talk by noting that the US, Canada, and Australia/New Zealand lead the world in per-capita CO2 emissions but that the rest of the world was eager to catch up though a general rise in the standard of living. “It’s our job to stop them,” I said, to general laughter. However, my meaning was clear. We stop the general rise in unnecessary energy consumption and increased greenhouse-gas emissions not by holding back the advance in the global standard of living but by finding ways to design systems that are much more efficient.
Related entries in: Configurable Processor | Embedded Systems | Environmental Compliance | SOC |