Altera at 28nm: rethinking the FPGA
Altera this morning took the unusual step of discussing their architectural strategy for the next generation of FPGAs far in advance of even product sampling. The reasons is that for Altera at least, 28nm will mark a clear inflection point between the days when FPGA roadmaps were driven by Moore’s-Law scaling and the era in which it will require major architectural innovations to deliver better performance in customers’ applications.
"In the network, we see transmission going to 100 Gb, then right on to 400 Gb," says Altera senior director of component products Luanne Schirrmeister. "Today it takes over 350K logic elements to implement the front-end block for 100G Ethernet: that includes media access controllers (MACs) and the Interlaken interface. To move this up to 400 Gb is just impractical in 40nm FPGAs. But even moving to 28nm by itself doesn’t solve the problem."
The transceivers won’t be the issue. Schirrmeister says Altera’s 28nm chips will offer 28 Gb transceivers—and enough of them to support 400 Gb ports. The problem will be the speed, density, and power of the programmable logic fabric. The hard truth is that there will not be that much more logic density or that much less power in moving from 40nm to 28nm, and little increase in speed.
So Altera will turn to architectural changes. To address the problem of speed and power, the company will introduce Embedded HardCopy. Altera—and customers—will be able to implement certain blocks using the company’s HardCopy metal-programmed ASIC capability: a step roughly half way between implementing in programmable logic fabric and doing a full-up cell-based ASIC implementation. So blocks put into HardCopy will be much denser, faster, and lower in power than if they were done in programmable logic, but not as much better as cell-based portions of the FPGA such as the DSP blocks.
Altera will implement these HardCopy blocks inside the FPGA chip, placed for optimum routing to the other resources. The result will be, in effect, an application-directed FPGA, with certain functional blocks hard-embedded in an otherwise field-programmable chip. Some of these blocks will come from Altera, and others will be customer designs.
The company will employ the existing HardCopy physical design flow, which is entirely internal to Altera’s engineering department. According to Altera senior director of HardCopy ASICs David Greenfield, embedding areas of metal-configured logic into the FPGA will require designating certain footprints on the die that may be converted from programmable logic fabric to HardCopy blocks. It will also mean dealing with routing issues such as blockages due to the HardCopy blocks interrupting long routing paths for the programmable logic. And, since Altera intends to let the embedded blocks run at the full HardCopy speed, rather than hobbling them down to programmable-logic speed, the devices will require some new prototyping methodology.
The second architectural change will address the issue of diminishing returns in density. While 28nm does bring increased active-component density, even after allowing for the increased design-rule limitations at the new node, the increase will not be enough to meet the design requirements Altera is anticipating from its networking customers. So the company is implementing partial reconfiguration—the ability to change the programming on a portion of the logic fabric on the fly.
The importance of this feature is that it allows the hardware equivalent of virtual memory. Because you can page functional blocks into the FPGA as they are needed and overlay them when they are not active, the size FPGA you need is determined by the size of the blocks (and other resources, such as memory ports) that need to be active simultaneously, not by the total size of your design. At the simplest level, for example, a line-card vendor for an aggregation box can load different front-end blocks into the line-card FPGA at different times of day, depending on what kinds of networks are delivering the most traffic at that time. At the most sophisticated level, designers can develop activity charts for the functional blocks in their system, divide the operation of the design into modes, and determine which blocks need to be active in which modes. Then on a mode change, the design would reconfigure portions of the FPGA to include only the blocks needed in the desired mode.
Like Embedded HardCopy, partial reconfiguration is not a trivial exercise for the design team. It requires a precise understanding of the design, and of the process by which the system freezes and captures volatile data before partial reconfiguration, and then restarts after the reconfiguration process. "Customers should design with the system hierarchy in mind in order to really exploit partial reconfiguration," Schirrmeister counsels. Altera has built the capability as an extension of its existing incremental compile facility, so at least it will be convenient to generate the various configuration layers without having to generate a full design for each possible configuration.
At this point it should be clear why Altera is sharing their thinking so early. Getting a significant improvement in speed, power consumption, and density out of the 28nm devices will require customers to think differently about their designs. That may or may not be a welcome exercise for a particular design team, but it would be universally unwelcome as a surprise. To those who would have preferred to just have bigger, faster FPGAs, all one can say is welcome to the real future, in which Moore’s Law continues, but entirely in the footnotes.
DonHo commented:
WT commented:
Johnny commented:
Peter Sommerfeld commented:
yeah... commented:
FPGA Enthusiast commented:
Andy T commented:
Marty Hauff commented:
BobUrUncle commented:















