Tabula FPGAs: this one could be game-changing
Tabula, a heavily-funded FPGA start-up led by a who’s-who of FPGA- and EDA-industry insiders, this morning unveiled a new FPGA architecture that challenges fundamental assumptions about RAM-configured logic devices. Tabula claims it will deliver FPGAs that in the same device can offer 1.6 GHz clock rates on critical paths, logic and memory capacity comparable to the largest new devices from Altera and Xilinx, rich SerDes, DSP, and memory resources, and yet a die size small enough to sell at a fraction of Stratix IV or Virtex 6 prices. And, despite the startling claims, the devices will use a familiar tool chain and will look to the user like traditional FPGAs.
Such claims clearly require and explanation. Tabula is not claiming a revolution in process technology—the design employs fairly normal 40nm CMOS—or in logic architecture—to the user the Tabula devices will appear as an entirely familiar array of look-up tables (LUTs), latches, and configurable interconnect. Rather, Tabula’s main innovation is to exploit brilliantly a growing imbalance in conventional FPGA implementation.
That imbalance is the disparity between the die area required by interconnect and that required by the logic elements and switches. By the 40nm generation, FPGA logic fabric has become a dense network of interconnect wires covering a very sparse array of LUTs, latches, multiplexers, and buffers. There is space on the silicon for more logic cells, but there is no room to get interconnect to them. You could make the LUTs, which are essentially 1×8 or 1×16 SRAMs, much larger, but studies have shown that his would not improve logic density for real designs. So while the lower routing and via layers are crowded, the space on the surface of the silicon is increasingly underutilized.
Tabula isn’t interested in giving a detailed description of what they’ve done—they much prefer conceptual metaphors—but here’s the idea. Instead of putting a single set of interconnect muxes, a LUT, and a latch in each logic cell, Tabula puts in eight of everything. Then they time-domain multiplex those eight sets of hardware on a 1.6 GHz master clock, so that the physical logic cell has a whole new personality—new interconnect routing, new LUT, and new latch configuration—every 600ps. Over the course of 5ns, the physical logic cell is, in effect, eight different logic cells.
Now the rest of the secret sauce. Tabula embeds transparent latches in the interconnect where it passes through the physical cell, and controls these latches with the time-multiplexing circuitry as well. So on each clock cycle, Tabula captures the state of the interconnect and logic cell in the latches. This allows the chip to pass the output of the LUT, for instance, to the input of the same or a nearby physical LUT on the next clock cycle. All the state that goes in flight during an eight-cycle sequence is available to drive cells on subsequent cycles. It is almost as if the FPGA had eight times as many logic cells as it actually does.
Tabula illustrates this concept as a three-dimensional chip of eight layers. Each logic cell connects to nearby cells around it on one layer, and to nearby cells above it on the next layer, in an expanding cone. In practice, users can visualize their design spread across the three dimensions, or mapped onto a single flat FPGA. However you choose to visualize it, you submit a netlist with timing constraints, and the tools map your nets across physical logic cells and interconnect, and across clock phases, to meet your constraints. Critical nets get mapped vertically, where they can often stay within one physical cell with essentially zero flight time. Nets with more slack get spread more widely across the die.
The architecture has several important implications. first, it packs about three times more logic into a given area than a conventional FPGA. Second, Tabula can emulate eight-port embedded RAM blocks by time-multiplexing the inputs and outputs of a single-port physical RAM, so the company can implement memory that is physically faster, denser, and lower in both static and dynamic power than the true eight-port blocks on a conventional high-end FPGA.
So yes, the claims on speed, density, and cost are plausible. Power is a more complex question. The devices implement a fine-grained clock-gating algorithm based on user signal activity, according to Tabula president and CTO Steve Teig. And the greater density means that the average interconnect length 80 percent shorter than on a conventional FPGA at the same geometry. Both of these factors sharply reduce dynamic power. But at the same time, there is the circuitry that manages the time-domain multiplexing activity, spread across the die in stripes and running at that 1.6 GHz clock frequency. "The net power compared to a conventional FPGA is design-dependent," Teig says. "The architectural overhead may or may not swamp out the savings."
So there is the story: speed, density, low cost. Tabula is aiming initially at the network switch sockets that make up the sweet spot of the high-end FPGA market. Presumably, the company will offer the configurations, on-chip peripherals, and IP those applications require. Specific product announcements should be coming soon, Teig says.