Achronix says it will triple the speed of FPGAs
Claiming moderate-density FPGAs clocking at 1.5 GHz, Achronix Semiconductor is taking the wraps off a full product line of field-programmable devices today. The Speedster chips, in densities from 24,576 to 163,840 look-up tables (LUTs) and with the familiar array of block and distributed RAM, 18-by-18 multipliers, fast SerDes, and PLLs, are aimed at moderate-sized logic designs that simply need to go faster than is possible with Altera or Xilinx FPGAs.
The key to Speedster’s performance lies in the heritage of the company—academic research in asynchronous logic design. For the heart of the FPGAs is in fact a fabric of asynchronous logic. But the fabric has been designed so that its asynchronous nature is invisible to designers and to front-end synthesis tools. To the world, the FPGA looks like any other LUT-based, SRAM-configured device. The asynchronous part only becomes apparent when the user runs timing analysis and gets a look at the maximum clock frequency. How Achronix has done this is an interesting story in itself.
The tale starts with one of the less popular approaches to asynchronous logic: two-wire signaling with a separate acknowledge wire, also known as three-wire asynchronous logic. (A big hint toward what Achronix architects have been up to appeared in IEEE Computer Society transactions in 2004.) In this system, when a logic gate creates an output, it codes the 1 or 0 on two separate pins. That allows for three active states: 1, 0, and no-signal. The next gate in the net is designed to wait until the no-signal states of all its inputs have turned into unambiguous 1s or 0s before acting. Once the gate has satisfied its hold time, it sends an acknowledge signal back up the net to the sources of the inputs, allowing them to move on to their next state.
In this way an entire logic net is self-timed. Signals propagate through the network limited only by wire delays and the time it takes each logic element to actually do its work. As soon as all of the inputs have arrived at a LUT and it has received Acknowledge signals from all the inputs it fans out to, it will look up the correct output and transfer it to its output pins, and signal the LUTs that created the input signals that they need wait no longer. The effect is similar to a technique known as wave pipelining that allows pipelines to operate without the use of internal latches.
This point is vital to understanding how Achronix can achieve such high speeds. Each LUT, by waiting until its inputs are ready and holding its output until its clients have acknowledged, in effect acts as a self-timed latch. So setting aside all of the details of how asynchronous logic works, it is legitimate to think of the Achronix fabric as simply a logic fabric in which each logic stage has its own built-in latch. In other words, the fabric implements a logic design by turning each logic level into a pipeline stage. What might have been six levels of logic between registers in the netlist now becomes a six-stage pipeline. The maximum logic depth for any path in the design becomes exactly one. And since maximum clock frequency is determined by the maximum logic depth between registers, the Achronix devices can clock at extraordinarily high frequencies. That is still an approximation to what is really going on, but it is a reasonable one.
And Achronix has gone to great lengths to save users from every having to go beyond this level. All of the asynchronous signals are local to the logic fabric, and cannot be accessed by users. The fabric is surrounded by a fully synchronous ring of registers which—with a large helping of secret sauce—resynchronize the stuff going on inside the fabric so that at the pins, the device appears to be a fully synchronous, conventional FPGA that just happens to have a 1.5 GHz maximum clock frequency. It is not necessary for users to know, and Achronix apparently has no intention of telling, just how the self-timed logic signals are resynchronized. But the result is that conventional RTL tools including Synplicity and Mentor work just fine with Achronix’s proprietary back-end flow, just as with any other FPGA.
So that is the magic. You enter your RTL, synthesize it, and map it onto familiar-looking 4-LUT logic elements. The Achronix back-end tool, in effect, pipelines all of your logic into pipes with 1-gate-delay stages, and turns the clock up accordingly. And you get a fast, apparently fully synchronous FPGA design.
The initial implementation of this bright idea is in the Speedster family, done in 65 nm TSMC 65 g+ CMOS. Achronix chairman and CEO John Holt explained that the company wanted to focus its start-up resources on the logic fabric, and so has relied heavily on IP partners for the other good things that go into a modern FPGA. Drivers and receivers come from Dolphin, a 10.3 Gbit/s multiprotocol SerDes come from Snowbush, and so on. Embedded interface IP also includes DDR3/DDR2 controllers. This allowed the relatively small—roughly 60-person—company to bring a fully-featured FPGA to market in a reasonable length of time.
There are other niceties in the design as well, such as an auto-loading feature, some interesting design ideas in the programming sequence to avoid excessive inrush current, and AES decryption on-chip for the programming bit stream, with e-fuse-based key management. And there is provision for turning down the operating voltage on the core.
That last bit is useful because in exchange for getting you to ASIC-like clock frequencies, the parts will warm your heart—or anything else you put near them. Power in operation, according to the company, is about 5-8W leakage power and anywhere from 20 to 40 W dynamic power. In this regard, three-wire asynchronous logic has both bad and good news. The bad news is, as the numbers show, that it is not exactly the most efficient way to implement logic.
The good news, however, is that because the heart of these FPGAs is self-timed, there are no huge clock networks running through the logic fabric. In fact there are no clocks in the logic fabric at all—they are all in the synchronization ring that surrounds the fabric. That means that the chip does not have the huge power dissipation—both dynamic and static–in clock networks that conventional FPGAs must have. And it does not exhibit the huge supply current spikes characteristic of any large synchronous design.
This is an important point. Because the majority of the circuits in the chip are self-timed, you don’t have all the logic transitions in a clock domain happening at the same time, on each clock edge. The transitions are smeared out over time. Looking at a trace of supply current vs. time, you simply don’t see the huge current spikes aligned with clock edges that so drive FPGA designers mad worrying about decap insertion, instantaneous IR drop, signal integrity and electromigration. "When we first looked at our design with Magma’s power analysis tool, it reported zero IR violations," Holt said. "We thought we’d broken the tool, and so did Magma. It took us a while to realize that this is just the nature of self-timed logic."
This is an obvious benefit to the chip designers. But it has its bright side for users, as well. You don’t have to be concerned about coming up with a design that happens to be a corner case the evaluation engineers at the FPGA vendor never thought about. And you get to use the performance that a synchronous design might have had to deny you just to put a guard band around the unknowns in the thermal and signal-integrity analyses.
Achronix says that Speedster parts will ship in the third quarter of this year, as will the development tools and a board-level development kit. Prices in volume quantities for the FPGAs will range from under $200 for the smallest device to around $2,500 for the largest.
AK commented:
MG commented:
PitchMonk commented:
MG commented:















