Subscribe to EDN

SoC power and variations: exploring fine-grained voltage manipulation

March 24, 2010

Two university papers at the International Symposium on Quality Electronic Design this week illustrated some very interesting thinking on fine-grained manipulation of power grids. One, presented by Kimiyoshi Usami of Shibaura Institute of Technology, explored the use of fine-grained power gating to reduce static power consumption. The other, presented by Kyu-Nam Shim of Texas A&M, examined schemes for fine-grained VDD manipulation to compensate for process variations.

It seems logical that designers would begin investigating fine-grained power management. Clock gating has followed a trajectory from full-die—just turning off all the clocks entering the chip—to block-level, all the way to latch-level. It’s plausible that power gating would follow a similar path. But dealing with power rails can be more complex than just suppressing clock pulses. Changing VDD involves saving state, isolating the logic, testing for stability, restoring state, and reconnecting the block to the rest of the chip. The overhead in time, area, and energy can be significant.

Usami’s paper explored designs in which a functional block may be idle occasionally, but for an unpredictable length of time. You would like to gate-off an idle block to reduce leakage current. But the gating process itself consumer energy, so there is what Usami called a break-even time: the minimum time the block has to remain powered-down in order to save as much energy as it costs to gate it. The challenge then becomes guessing when to gate off the power so that you can leave it off for at least the break-even time. Note that as the granularity of the gating gets finer, prediction becomes more difficult. It’s easy to figure out that you don’t need the application processor when the handset is asleep. But it’s less obvious to determine when it is safe to shut down the multiplier in a running application processor.

Usami’s survey of the literature turned up only a couple of interesting ideas. The obvious one is to ask the compiler for hints. But even assuming you could get the attention of the compiler writers, and assuming the thread followed a deterministic path, the compiler would not be able to predict the sequence of events in a multitasking, operating system-governed system. Another idea was to use a statistical guess: if the block has been idle for one break-even time, shut it off: it will probably remain idle for at least that much longer.

To explore decision strategies in a realistic environment, the researchers built a model based on a MIPS R3000 CPU, and extracted power data. They examined four functional blocks for power-gating: the ALU, shifter, multiplier, and divider. The analysis was done while executing two routines, a discrete cosine transform (DCT) and a quick-sort. The methodology was to apply various decision strategies to each unit while running each program, and to calculate the energy saved or lost.

One surprise was that the simplest strategy the team examined—simply shutting down a unit each time it became idle—wasn’t that bad. It actually saved some energy in the quick-sort routine, but was less efficient that the other strategies in the DCT.

Other strategies the team examined included the above-mentioned one of shutting the unit down after one break-even time; the same, but with the break-even time adjusted for die temperature; employing a bit that remembered whether the strategy had guessed right last time, much like a branch-prediction bit; and some refinements thereupon. The calculated data showed that some of the more refined algorithms were nearly as good at saving energy as perfect foreknowledge would have been. The conclusion is that it is feasible to power-gate functional blocks in a CPU based only on information local to the block itself, without reference to global information, external hints, or prescience.

The second paper looked at an entirely different aspect of power control: this time not for energy savings but to compensate for delay variations. Shim pointed out that manufacturing variations in such parameters as threshold, channel length, contact resistance, and interconnect impedance could lead to significant delay variations even within a local area on one die. And aging effects make such variations change over time. Therefore some kind of circuit that measures actual delays and applies compensation is increasingly desirable.

Shim said that adaptive body bias has been used for this purpose, even in fine-grained schemes. But applying body bias increases junction leakage. Another alternative would be to adjust VDD to compensate for increasing delay. Shim cited a 2007 paper by an IBM team that proposed two supply rails—one for standard VDD, and one for a slightly higher voltage: not enough for the logic to require level shifters, but enough to overcome an increase in path delays through the logic. Each block would select one rail or the other through high-side switches similar to the sleep transistors used in power gating. Such an approach could be used at the block, critical-path, or even cell level, but the overhead for routing and switches grows as the granularity gets finer.

Shim and his team proposed another alternative for fine-grained power control. Instead of a high-current regulator distributing a second fixed voltage across the die, the team tried a dual-rail approach in which one adjustable voltage was supplied from an off-die regulator and distributed normally. But then the team placed small local linear regulators—also with adjustable voltages—in each of the critical blocks, where they could supply an alternative voltage just to a particular logic cloud through local routing. This approach allowed the team to adjust for die-wide delay degradation with the external supply, and to touch up a local delay problem by connecting the slow block of logic to its local regulator, turned up sufficiently for the logic to meet timing.

This approach considerably simplified the routing problem, eliminated the need for many off-chip regulators, and significantly reduced the system power needed to overcome an increase in delay. The team calculated that their trial designs saved 36 percent on power compared to simply over-margining to account for variations, and 17 percent compared to a global adaptive supply voltage scheme. They estimated that the area overhead for the dual-adaptive supply scheme was about 15 percent.

This work may prefigure a next step in which each logic block on a chip has its own tiny point-of-use regulator, with the voltage set to meet timing and power requirements at the current operating frequency, based on delay measurements. This blend of fine-grained adaptive supply voltage with dynamic voltage-frequency scaling becomes more feasible if the variable-voltage regulators are implemented on a second die or redistribution layer stacked with the logic die, or even if they are implemented using Toshiba’s over-metal thin-film transistor technology.

Posted by Ron Wilson on March 24, 2010 | Comments (1)

August 26, 2011
In response to: SoC power and variations: exploring fine-grained voltage manipulation
Peggy commented:

There are no words to describe how boadciuos this is.

POST A COMMENT
Display Name
captcha

Before submitting this form, please type the characters displayed above. Note the letters are case sensitive:

Advertisement
Advertisement
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows