Zibb

Feature

FPGA architectural power-saving techniques at 40 nm

As geometries shrink, FPGAs must begin to employ design-specific power-management techniques in order to save power while meeting timing.

By Seyi Verma, Altera Corp -- EDN, 9/23/2009

The 40-nm process technology node offers clear benefits over prior nodes, including the 65-nm node and the more recent 45-nm node. One of the most attractive benefits is higher integration. This enables semiconductor manufacturers to pack greater functionality into less physical space at lower costs. Although increased density and performance are valuable benefits, one of the most pressing design considerations for today's system developers is power consumption. The need for low power consumption is being driven today by the trends toward compactness of form factor, portability, and power efficiency.

In addition to the silicon processing techniques, reducing power consumption requires architectural innovations. Smaller geometries provide the added benefit of reduced dynamic power consumption by way of less parasitic capacitances, as well as raise static power unacceptably due to increased leakage currents if no steps are taken to reduce it. While intelligent systems are designed to minimize dynamic power—for example, powering down certain sections of an unused chip or unused logic—the focus of this article is to explore some architectural innovations used to minimize the constant drain of power consumption (also known as static power). Static power is like the annoying leaky water faucet in the house that keeps you awake at night when you are trying to sleep.

How do FPGA designers minimize static power at smaller process geometries without affecting chip performance? To answer this question, let's first examine the basics of power and then look at how static power can be minimized.

The basics of power



Power consumption is of course composed of both static and dynamic power. Dynamic power is the power consumed through the operation of the device caused by signals toggling and capacitive loads charging and discharging. Dynamic power decreases with Moore's Law by taking advantage of process shrinks to reduce capacitance and voltage. Figure 1 shows the variables affecting the dynamic power portion of the total power consumed in an FPGA. The challenge is that when more circuits are implemented with each process shrink, the maximum clock frequency increases. While the power reduction declines for an equivalent circuit from process node to process node, the capacity of the FPGA doubles and the maximum clock frequency increases. Since dynamic power varies with the square of the voltage applied, scaling down the operating voltage reduces the increase in dynamic power.

Static power is the power consumed by an FPGA when it is programmed and there are no clocks operating, and hence no active circuits. The sources of static power in 40-nm devices are shown in Figure 2, while Table 1 shows the processing techniques used to reduce static power.



At submicron geometries, semiconductor static power consumption can increase dramatically in the migration to more advanced processes like 40 nm and beyond. With the migration to smaller process nodes, transistor channel lengths and gate oxide thickness reduce, making it easier for current to leak across the shorter physical distances and thereby increasing static power. Source-to-drain leakage, also known as subthreshold leakage, is one of the dominant forms of leakage. Here, current flows from the source to the transistor drain even when the transistor gate is off. As transistors get smaller, it is increasingly difficult to prevent this current from flowing; therefore, the smaller 40-nm transistors tend to exhibit source-to-drain leakage with much greater magnitude than transistors with longer channels if all other parameters are equal. The threshold voltage (Vt) of the transistor also influences the amount of source-to-drain leakage. The Vt of the transistor is the voltage at which the channel conducts current between the source and the drain. Small, high-speed transistors need a lower Vt to maintain the speed with which the transistor can be turned on and off via a gate control, but this increases the leakage because the transistor channel cannot be turned off completely.

A related issue is gate oxide thickness, which—along with doping—influences Vt. A thinner gate oxide allows the transistor to be switched on and off faster, but it also allows greater leakage from the gate through the oxide to the substrate. These sources of leakage current increase as decreasing process geometries make smaller gate lengths possible.

Techniques used to minimize static power

Silicon processing techniques used to lower static power in FPGAs include lowering the operating voltage, using transistors of varying gate oxide thickness, and employing low-k inter-metal dielectric, copper interconnects, and super-strained silicon to balance performance and power reduction at the transistor level.

While processing techniques require careful deliberation to judiciously trade off between power and performance of the transistors in an FPGA, these techniques are available only to the FPGA designer, not to the user. Since the FPGA chip designer has no idea what configuration the user will program into the part, he cannot employ the wide range of design-specific power-management techniques available to the ASIC designer.

But suppose that the FPGA user could participate in one specific and very important decision: whether to use high-speed or low-leakage transistors in a particular path. A relatively new technology known as PPT (Programmable Power Technology) allows FPGA users to do just that, and hence to reduce static power further in their own designs while not affecting the performance of the FPGA.

PPT adds another layer of static power savings through architectural innovations that allow the design tools, in response to the user's needs, to manipulate the behavior of some transistors in an FPGA. The manipulation of transistors starts at a software level, during synthesis and place and route, and ends at the silicon level, configuring the FPGA silicon to optimally deliver the lowest static power possible for every unique user design while still meeting the design's performance requirements.

To explore how this idea works, consider the structure of a modern FPGA. FPGA cores are fundamentally made up of logic, memory, and DSP (digital signal processing) blocks. In traditionalFPGAs, all of the blocks are designed to run at only one speed—the highest possible speed, as depicted by the yellow blocks in Figure 3. This is of course necessary in these parts so that the user's critical paths can employ the fastest available transistors.

But always using the fastest transistors means always using low-Vt transistors, resulting in excessively high static power consumption.

Yet experiments have shown that the majority of designs have very few critical paths that actually need the highest-performance logic to meet timing, while the majority of the design paths have ample excess timing margin, or timing slack. By using PPT all of the logic blocks in the array except those designated as timing critical are set to low-power mode, as depicted by the blue blocks in Figure 3. With only the timing-critical blocks set to high-speed mode, static power dissipation in FPGAs goes down substantially.

In addition, Programmable Power Technology puts unused logic, memory, and DSP blocks into low-power mode, which further decreases static power. At a very high level, Figure 4 shows how development software controls the transistors to switch between high-performance and low-power modes. In any design, the software automatically determines the slack available in each path through the timing constraints of the design. It then automatically sets the transistors—and hence the logic blocks—to the appropriate mode—high performance or low power—by adjusting the back-bias voltage of the transistor.

For example, to set an n-MOS transistor in the FPGA core to low-power mode, the software reduces the back-bias voltage, making it more negative, which makes the transistor difficult to turn on. This minimizes subthreshold leakage currents and unwanted static power in non-timing-critical circuit paths. Conversely, to set the same transistor to high-performance mode, the software increases the back-bias voltage, making it less negative, which makes the transistor easier to turn on. This increases the subthreshold leakage in the transistor, but it also increases the switching speed for those few timing-critical paths to help meet the design's specified timing constraints and deliver maximum performance.

Similar techniques work to set the p-MOS transistors to the appropriate mode. To understand the static power savings seen on an FPGA, Figure 5 shows that in an example design without PPT, a 680K LE (logic element) FPGA would consume 37% more static power than it would if the FPGA user employed PPT.

With PPT, the FPGA user can call on only the exact amount of high-speed logic required for a design to reach its desired performance with a very high degree of precision. In our implementation, the software controls the choice between high-speed and low-power logic on a per-tile basis. Each tile contains two LABs, or a LAB and a DSP block, or a TriMatrix memory, all with associated routing. For example, on a large device more than 5,000 tiles on the FPGAs can be individually controlled as high speed or low power to get the lowest possible power for the design (Figure 6).

The development software automatically optimizes the design by placing tiles into high-speed or low-power modes, requiring no user effort. Each time the software compiles a design for an FPGA, it automatically optimizes the design to meet specified timing constraints while minimizing both dynamic and static components of power. The resulting programming file loaded into the FPGA includes information that sets each tile into its high-speed or low-power configuration (Figure 7). Overall, PPT reduces the static power of a user design, but because the performance of the design is limited by the slowest critical paths rather than the average speed, PPT does not affect the performance of the FPGA in the user's application.

DOCT (dynamic on-chip termination) is another feature available in specific FPGAs that helps to minimizing power. For example, migrating from a DDR2 application operating at 1.8V to a DDR3 application at 1.5V provides about 30% static power savings on the I/O due to the voltage reduction. To reduce I/O static power further, FPGAs can provide the ability to turn on and off the series termination (RS) and parallel termination (RT) dynamically during data transfer. This technique also will provide improved power performance without interfering in the DDR interface's ability to meet specifications. During a write cycle, RS is turned on and RT is turned off to match the line impedance, while during the read cycle, RS is turned off and RT is turned on as the FPGA implements the far-end termination of the bus, as shown in Figure 8. On a typical 72-bit DIMM, this technique delivers up to 1.9W of I/O static power savings at 1067 Mbps when compared with a standard FPGA using DDR2 without DOCT.

Parallel and serial OCT provide the appropriate line termination and impedance matching for both the read and write busses. This removes the need for external resistors at the FPGA and saves on external component costs, board space, and routing complexity.

Overall, moving to smaller geometries delivers the expected Moore's Law benefits of increased density and performance, but to deliver the lowest power in conjunction with the highest performance, FPGA architectural innovations are needed beyond processing innovations. Unique technologies such as Programmable Power Technology and DDR3 with DOCT enable high-end FPGAs to deliver the lowest possible power without compromising on performance of next-generation designs.

Author Information
Seyi Verma is a senior member of the high-end technical analysis staff at Altera. He has been with the company for 11 years and is responsible for technical product analysis, FGPA architecture, and technology solutions for Altera’s high-end FPGA product lines. Verma has a bachelor’s degree in electrical engineering from the Rochester Institute of Technology (Rochester, NY). He may be reached at newsroom@altera.com.


Reed Business Information Resource Center

Featured Company


Related Resources

ADVERTISEMENT

ADVERTISEMENT

Feedback Loop


Post a CommentPost a Comment

There are no comments posted for this article.

Related Content

 

By This Author

There are no additional articles written by this author.


ADVERTISEMENT

Knowledge Center


Events

Oxford University Successful RF PCB Design Short Course
Dates: 2/11/2010 - 2/11/2010
Location: Oxford, United Kingdom

Oxford University Systems Engineering - Fast Track Short Course
Dates: 3/6/2010 - 3/21/2010
Location: Oxford, United Kingdom

Oxford University High-Speed Noise and Grounding Short Course
Dates: 6/24/2010 - 6/25/2010
Location: Oxford, United Kingdom

Submit an EventSubmit an Event




Technology Quick Links

EDN Marketplace


©1997-2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites