Xilinx FPGA introductions hint at new realities
Xilinx announced its 40-/45-nm generation of FPGAs this morning with the usual accoutrements of a major product launch, but also with several major departures from the Moore's-Law driven traditions of the FPGA industry. These departures say a great deal about the company's strategy, about the realities of electronics below 65 nm, and about the roles FPGAs will play in the future of the industry.
The company announced two new families, the Virtex-6 family based on UMC's 40-nm gp process, and the Spartan-6 family built on Samsung's half-node-earlier 45-nm process. Xilinx claims the two families each can deliver up to twice the logic capacity and half the power consumption of previous-generation devices in the same lines. Conspicuously, the company is not making any such claims about performance.
By the numbers, the Spartan-6 family will comprise two branches, LX and LXT. The former, Spartan-6 LX, will start from a low-end of 2,104 logic cells, 176 Kbits of RAM, four DSP slices, and 120 user I/Os in a 144-pin package. The largest part in the LX family will offer 92,160 logic cells, 6,182 Kbits of RAM, 182 DSP slices, and 498 user I/Os in packages up to an FGG 786. The parts offer both 1.2V and 1.0V core-voltage options (see "Power and FPGAs" below for more). The LXT family is similar in scope, starting with a 14,856-cell device and ranging up to the same maximum density as the LX family. But the LXT also adds—unusually for a low-priced FPGA family—high-speed serial resources: up to eight low-power 3.2 Gbit/s transceivers, four hard-wired memory controllers, and one PCI-Express endpoint block.
The Virtex-6 family is likewise divided into three branches, but along quite different lines. The Virtex-6 LXT sports all the familiar features of the Virtex line, including block-RAM with embedded FIFO and ECC logic, DSP blocks, PCI-Express controllers, Ethernet MAC blocks, and high-speed transceivers. The LXT family ranges up to 474,240 logic cells, 864 DSP slices, and 1,200 user I/Os. The SXT family covers a smaller range in terms of logic-cell count, but packs in from 1,344 to 2,016 DSP slices, making the parts essentially single-chip supercomputers. The HXT family is similar to the LXT but with a limited number of 11.2 Gbit/s transceivers.
As an interesting footnote to these statistics, Xilinx seems never to tire of numbers games. In moving to the new families Xilinx has slightly altered their logic-cell arrangement so that a logic slice comprises four six-input look-up tables (LUTs) and eight flipflops. This compares to four LUTs and four flops in the previous generation. Xilinx has thoughtfully "adjusted" the logic-cell counts in their literature to reflect what they feel is a more representative number rather than the actual number of physical LUTs on the chip. Thus the top-of-the-line Virtex-6 SXT is shown in the literature as having 476,160 "logic cells" when in fact it has 297,600 actual LUT/2-bit-register clusters. Whether this represents an attempt to help designers in estimating the resources on the chips or whether it represents simple specsmanship is an unanswerable question.
Using either set of numbers it is clear that the new-generation Xilinx parts will be huge, with the Spartan family in particular dwarfing the conventional notion of a low-cost FPGA. But there are other significant trends to be seen from the announcement, as well.
One is the emphasis on density rather than performance. Xilinx is stating no performance guidelines, but it appears that the same block, compiled for both the Virtex-5 and Virtex-6, would generally see little difference in maximum clock frequency—perhaps less difference than one would see with variations in placement in a Virtex-5. This trend is visible in recent Altera announcements, as well. Of course there are other ways of impacting system performance beyond raw clock frequency, including exploiting the greater capacity of the new parts to parallelize or pipeline critical paths, and employing the more extensive embedded (non-configurable) blocks such as the PCI-Express controllers and FIFO logic blocks.
Another noteworthy point is the growing importance of high-speed serial transceiver performance. Even the Spartan family now offers high-speed serial I/O, albeit limited to 3.2 Gbits/s, and a limited PCI-Express controller. In the Virtex-6, the number of high-speed pins skyrockets. The PCI-Express controller is gen-2-capable. And the third branch of the Virtex-6 family—the HXT—raises the maximum bit rate of the fastest serial pins from the LXT and SXT's 6.5 Gbits/s to 11.2 Gbits/s.
It remains to be seen just how functional these very fast I/Os will prove in real systems. In the recent past, FPGA high-speed serial links have been designed primarily for point-to-point use between chips in near-proximity, not for backplane or cable use. The demands on the pre-emphasis, receiver, and equalization circuitry for long-haul use has been impractical for the FPGA designs. Lacking clear specs for comparing these attributes of the I/Os, we are left with just the maximum bit-rate figures to compare.
Split manufacturing strategy
A third interesting observation is the fragmentation of Xilinx's foundry strategy at the new node. Xilinx Director of Product Marketing Brent Przybus explained that the company chose to split production of the two families between UMC and Samsung for both time-to-market and capacity reasons.
"Samsung's 45-nm process was maybe six to seven months ahead of UMC's 40-nm half-node process," Przybus said. "So it matched up with our goal of sampling the Spartan-6 devices now. But there were advantages to the UMC process for the Virtex-6, although there is little difference in underlying logic density." Also, Przybus said, Xilinx was attracted to the higher capacity available in Samsung's fabs to support the Spartan family.
The result of the split strategy is a slight separation in schedule for the two families. Spartan-6 parts are in engineering samples now. Xilinx plans to deliver engineering samples of the Virtex-6 in the second quarter of this year. Production runs on both families should begin by the end of 2009.
There will also be a slight complication for customers. Even though the two families nominally use the same logic and routing architectures, because of the differences in timing and power a design developed on Virtex-6 and moved to Spartan-6 would have to be reverified and the new parts requalified.
Power and FPGAs
Finally, there is the question of power. The new families get dynamic power reductions from the intrinsically lower dynamic power of the 45- and 40-nm processes, due to reduced capacitance, shorter interconnect distances, and lower drive currents. They also benefit from the option to reduce core voltage to as low as, in the case of the Virtex-6, 0.9V. This latter point will help with static power, as well.
Interestingly enough, the lowest operating voltage in the Spartan family, 1.0 V on the LX only, may mean than in some designs—particularly if the duty cycle is low—the overall system power may be lower using a Virtex than a Spartan chip. The low-voltage operation must be chosen for the entire core, by the way. There is no provision for splitting the core into two voltage regions, or for changing the core voltage after design time.
But FPGAs still lag far behind ASICs and ASSPs in other means of power management. Voltage islands, dynamic power gating, dynamic voltage-frequency scaling, and threshold-voltage manipulation are still not available in the new Xilinx family. For that matter, of these techniques only programmable threshold-voltage control is available from Altera, and none of the techniques is supported by Xilinx. So the designer's ability to manage power consumption in critical designs is quite limited. Przybus did point out that Xilinx offers, although it does not exactly encourage, partial reprogramming of devices during operation. This technique could be used with great care to provide a slow, coarse-grained version of power gating.
Xilinx designers have attempted to compensate for this lack of power-management knobs, Przybus noted. The fabric architects added more fine-grained routing resources between the logic blocks, reducing the average routing load per node. Tool designers added routing algorithms, including not just performance-optimized but also power-optimized to further reduce this source of dynamic power consumption. In addition, synthesis-tool vendors are being encouraged to recognize implied uses for the families' embedded resources, such as the DSP slices, memory controllers, and even the PCI-Express controller, and to use the much lower-power embedded resources in preference to logic fabric whenever possible.
These help with dynamic power. To help with static power, Xilinx relies on its own designers being able to choose from three different oxide thicknesses on a transistor-by-transistor basis, only using low-threshold transistors when absolutely necessary. But there is no way to extend that choice to the users, as well. The only way to reduce the leakage current in a logic block is to not use it, permitting the tools to power-gate it.
To some extent, these choices reflect the reality of FPGA design. It would simply require too much silicon to offer dynamic voltage islands, dynamic power-gating, and controllable threshold voltage to the user. It could also vastly complicate the design tools. And that is the opposite from the direction in which Xilinx as a company is moving.
Xilinx President and CEO Moshe Gavrielov sees many potential new users coming into the FPGA world, not from the ranks of experienced FPGA designers or even ASIC designers, but from the realm of software and systems engineering. In his vision, the FPGA becomes nearly a slate on which systems designers can write a systems-level design, without concerning themselves with the details of implementation. It is a bold, unifying vision that must contend carefully with the reality of the need for fine-grained control over the implementation at some points in many designs.