Subscribe to EDN
RSS
Reprints/License
Print
Email

Discovering the last unrealized power reduction

Power-optimized architectures help engineers designing chips with blocks that can power down or operate at reduced frequencies and voltages.

Jay Chiang, Synopsys -- EDN, September 9, 2010

At A Glance

  • Design engineers are increasingly employing advanced techniques to meet the more stringent power requirements of next-generation chips.
  • It pays dividends to aggressively reduce power for circuits that are active in standby mode because it can lead to significant savings in battery life.
  • Power gating isn’t feasible for circuits that must continuously remain active, so the only choice is to make the circuit intrinsically low power.
  • Traditionally, datapath generators produce the most area-economic architectures that still meet the timing constraints.
  • Because power is a physicaldomain characteristic, your standardcell library can affect the power-optimization result.
 View as PDF


Discovering the last unrealized power reduction imagePower has become one of the most important design criteria for almost all design projects, and the industry, in response, has invested a lot of effort to address this challenge. Consequently, we have seen a plethora of lowpower design techniques and new technologies emerge. Some of these techniques are relatively easy to adopt. For example, clock gating and multiple-threshold-voltage cells have become mainstream design practices because they are effective. In addition, EDA tools can automate their implementation. Some techniques, on the other hand, require more planning. For example, design engineers can group SOC (system-on-chip) circuits into multiple blocks so that they can power down some blocks or operate them at reduced frequencies or voltages when operating conditions allow it. Although these more advanced techniques take more deliberate effort to implement, design engineers are increasingly employing them to meet the more stringent power requirements in next-generation chips.

When applying low-power design techniques, design engineers typically concentrate on only the few modules, such as embedded processors and on-chip memories, that consume more power than the other blocks. Although this focus is necessary, it is incomplete. Engineers may often overlook the fact that many low-power-consuming blocks frequently have a greater impact on energy consumption than their power- consumption number suggests. If you correctly plan a chip’s power-management strategy, the power-consumption profile and energy-consumption profile should not correlate closely. You should keep the active period of the high-power-consuming modules as short as possible. The modules that remain powered for a long time should not consume too much power. Even though these modules consume less power than other blocks, they consume a higher proportion of energy once you factor in their extended active time.

Consider a hypothetical cellularphone design. Under typical usage, the cellular phone is mostly in standby mode. During standby, most circuits, except the wireless receiver or receivers, are off. Although standby mode consumes only a fraction of the power that the other modes consume, it still consumes 36% of the total energy, after factoring in the active period. In other words, it pays dividends to aggressively reduce power for circuits that are active in the standby mode because it can lead to significant savings in battery life (Table 1).

Discovering the last unrealized power reduction table 1

Such opportunities for energy reduction exist in most SOCs. In general, if the chip has multiple power domains, it has multiple power modes. If you identify the power modes that are most active, you can isolate the circuits that have higher impact on the chip’s energy consumption, and you can more aggressively pursue power reduction in these focused areas to reduce the overall energy footprint of the chip.

Analysis of these circuits in further detail uncovers some interesting characteristics. These modules must remain on for extended periods because they perform essential functions for the chip in that operating mode. They are often continuously calculating data or processing signals. In addition to the cellular-phone example, other circuits, such as audio or video processors in playback or talk mode and signal-processing blocks, such as equalizer, modulation, or cryptology units, in wireless and networking applications, have more datapath content than control logic and can benefit considerably from low-power techniques.

If you consider the technology horizon, a new generation of connected devices aiming to deliver better user experiences and higher data rates is driving many new design starts. Consequently, these new projects will demand higher audio quality, higher video resolution, more pixel support, more complex signal processing, faster data rates, and so forth. Increases in the size and complexity of the signal-processing blocks in turn lead to a higher energy footprint in the new designs. The impact of this design complexity requires design engineers to more closely manage the power consumption for these blocks.

Low-power datapaths

Power gating isn’t feasible for circuits that must continuously remain active, so the only choice is to make the circuit intrinsically low power. The first step is to lower the voltage, the operating frequency, or both without missing the performance target. However, slower clock frequencies mean deeper logic levels, and these circuits usually include more datapath logic than control logic. Datapath logic is notoriously prone to glitches—unwanted transitions that settle before the next clock edge—and switching because any spurious transitions propagate downstream and ripple throughout the entire datapath tree. Although glitches pose no functional issues, these transitions still consume power.

It is critical to avoid increasing power in other areas while reducing it in one area. Making this power-reduction approach more effective requires more balanced, shallower architectures that can limit the propagation of the transitions. Although most EDA tools do an adequate job producing timing- and area-optimized architectures that designers later optimize for power at the gate level, they are less effective in considering the power consequence of architectural selections upfront.

Some design engineers try various means of writing power-optimized architectures into RTL (register-transfer-level) code to save power. However, most low-power architectural-RTL coding focuses on reducing area, based on the assumption that using fewer cells equates to less power consumption. For example, some design engineers in networking and multimedia applications truncate the LSBs (least-significant bits) of the data when precision is not critical.

Discovering the last unrealized power reduction figure 1Although this technique is useful, you must understand the details of how to implement it. Datapaths differ from other logic circuits in that they perform computer arithmetic that generates carries and sums, requiring carry-propagating adders to add together the carry and sum to produce a binary number. For RTL coded at a high level, EDA tools usually can generate datapath architectures, keeping all the numbers in redundant format—annotating the value of the number with both carry and sum—until the last level of the output.

If you code the datapath at a lower level, you might turn to coding practices that divide a larger datapath block into several small ones, forcing the RTL-synthesis tools to insert carry-propagation adders into the final stage of every smaller block (Figure 1a), hence increasing area and delay. The resulting increased area sometimes offsets the entire power gain from the LSB truncation. For optimal results, you must consider RTL-coding practices that allow the merging of datapath blocks to avoid unnecessary binary conversions (Figure 1b).

Discovering the last unrealized power reduction figure 2Some design engineers also try to code isolation logic in front of the datapath logic so that they can suppress the switching and transition of the datapath tree until there is valid data. Depending on the input-data profile and how frequently the data is valid, this approach could save significant dynamic power. The concept, operand isolation, is similar to clock gating, except that it takes place on the datapath instead of the clock paths (Figure 2). The concept, also known as data gating or datapath gating, is appealing, but it is sometimes difficult to implement in practice. Unlike clock gating, adding isolation logic to datapaths increases the path delay. This timing overhead can make it tricky to close timing. Some RTL-synthesis tools can automatically insert the isolation logic; however, engineers do not widely use the feature because it degrades timing.

An alternative approach

Datapath generators traditionally produce the most area-economic architectures that still meet the timing constraints. Engineers then optimize the generated designs for power at the gate level. At this level, the scale of optimization involves only a few gates. The flows don’t provide power-optimized architectures, so some designers manually code them in lowlevel RTL, which can hinder datapath optimization and degrade the quality of results.

To improve this situation, the first step is to understand what kind of datapath architectures consume less power so that you can use the knowledge to create more low-power architectures. Second, you should characterize the power costs of the datapath structures at a high level so that you can fully consider the power consequences when making architectural decisions.

Examples include the power-stingy architectures of the Synopsys DesignWare minPower components. These low-power datapath architectures are flatter, shallower, and more balanced than traditional architectures to produce fewer spurious transitions. When these unwanted transitions occur, datapath structures with smarter cell selections can limit their propagation. For example, instead of using common XOR-based datapath cells, such as full adders or XOR-based booth encoders, the manpower components employ architectures that favor more AND or NAND cells so that fewer transitions ripple throughout the datapath tree.

Integrating these power-friendly architectures yields some advantages. Aside from being easier to use, these architectures allow designers to capture power-saving opportunities that are hard to realize with a manual approach. Because power consumption depends on operating conditions, it is not enough to consider the circuit architecture outside the design’s context or independently of circuit switching.

To achieve the best result, you must reevaluate the architecture using a logic-synthesis tool, such as Design Compiler, employing the timing model and switching profile.

Discovering the last unrealized power reduction figure 3For example, consider a two-input multiplier with uneven switching activities on the operands. Although the multiplication is a commutative operation, the dynamic-power consequence is not. If you use the high-activity input for partial-product generation, the multiplier will consume more dynamic power due to a higher level of switching activities that propagate through the rest of the multiplier. If you switch the high-activity input to the input of the partial-product selector, you can lower the switching activity in the partial-product generator as well as the overall multiplier (Figure 3). This kind of optimization is hard to plan in the RTL-coding stage and is more suitable to perform during synthesis.

Applying this concept on a larger scale enables you to achieve more power savings. In general, irregularity in the data or a circuit provides a power-saving opportunity. For example, multimedia data usually has uneven activities among the data bits. It usually has lower activities in MSBs (most-significant bits) and higher activities in LSBs. If you are aware of this phenomenon, you can design datapath architectures so that the LSBs feed into the datapath tree downstream, hence reducing the dynamic power for audio- or video-signal processing. Likewise, you can use the circuit’s irregularity to lower internal power and leakage power. For example, you can substitute regular cells with slower high-threshold voltage or low-drive cells whenever there is timing slack.

You can configure the Design-Ware minPower architecture to create more timing slack to maximize this effect. However, manually exploiting the circuit’s irregularity is difficult because it is imperative to balance the power cost against the area cost to avoid any adverse effect from over-aggressive power optimization. You must automatically consider timing and design needs during the architecture selection to realize the power savings with minimal area overhead.

The biggest advantage of powerfriendly architectures is that they do not disrupt design flows. The power savings come from making power-smart choices when implementing the microarchitectures for the RTL code. This approach requires no changes to the higher-level software, system, or RTL design. After you select the architectures, the netlists go through the same gate-, physical-, or process-level optimization in the back end. You need not change design flows or design-database formats except for adding a new knowledge base—a synthetic-library database (.sldb file)—to the RTL-synthesis stage. The power savings increase the design project’s original power strategy by as much as 42% additional power reduction at the block level and as much as 24% reduction at the chip level.

This architecture-level power-optimization approach does have some limitations, however. To get the power benefit, you must integrate in-house or third-party IP (intellectual property) into the design at the RTL because the optimization takes place at the RTL. The automatic IP insertion relies on a logic-synthesis tool, such as Design Compiler, to extract the datapath architecture from the RTL; therefore, the code must be in a style that the synthesis tool recognizes. In other words, if the datapath is in low-level RTL that already prescribes the architectures, the synthesis tool cannot alter the design’s intent.

To enable architectural-level power optimization, designers should start from high-level RTL code using as much operator inference as possible. To allow extraction of larger datapath blocks, you should consider using automatic retiming instead of manually inserting a pipeline. Whenever possible, use a realistic representative switching profile, which usually improves the result, especially for applications that have unevenly distributed activities on the input.

Because power is a physical-domain characteristic, your standard-cell library can affect the power-optimization result. A standard-cell library with a collection of datapath cells that have good drive strength and threshold-voltage variations allows wider architecture selections.

Some libraries support special datapath cells but have few or no drivestrength variations or have them only with standard threshold-voltage implementations. You often do not select these cells, therefore limiting the number of available architectures. To improve results, use a standard-cell library with more drive strength and threshold-voltage variations that have accurately characterized power numbers.

You can’t optimize what you can’t observe. To lower the energy consumption of your next SOC project, you must first identify which portions of the SOC are consuming the most energy. It is worth distinguishing power consumption from energy consumption. To get a more energy-efficient design, you must pay attention to the circuits that remain on for a long time. Therefore, you must carefully analyze the power modes to identify the best energy-saving opportunities.

When working on these modules, decide early in the chip-planning stage to run these circuits at low clock frequencies and low voltage. The additional power-saving opportunities must come from designing more power-friendly circuits that require less switching activities and are built with a higher percentage of low-leakage and low-drive cells. The most inexpensive way of achieving this goal is by using power-friendly architectures that require no costly design-flow re-engineering efforts.
RSS
Reprints/License
Print
Email
Talkback
Canon Resource Center

Featured Company


Most Recent Resources

Advertisement
Related Content

No related content found.

  • 0 rated items found.
Advertisement

KNOWLEDGE CENTER

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Engineering Careers
Jobs sponsored by
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows