Radical FPGA takes on packet processing
By Ron Wilson, Executive Editor - July 10, 2006
Start-up Cswitch Corp. this month introduced a novel configurable-logic chip targeting packet-processing applications in networking, wireless base stations, and telecom-infrastructure applications. The device comprises a heterogeneous array that intersperses rows of general-purpose logic cells—much like those in conventional FPGAs—with rows of SRAM-configured RAM and CAM (content-addressable-memory) blocks, ALUs (arithmetic-logic units), and specialized packet-processing blocks. The intent, according to the company’s president and chief executive officer, Doug Laird, is to serve the growing number of applications that must process packetized data at wire speed with a device much faster and lower in power—for these applications—than a conventional FPGA but with much lower investment and time to market than an ASIC would require. In effect, the product is an application-specific FPGA.
I/O surrounds the configurable fabric. Configurable SERDES (serializer/deserializer) blocks, each of which can support PCI Express, XAUI (10-Gbit-attachment-unit-interface), Fibre Channel, or gigabit-Ethernet connections, line the chip. Similarly configurable MAC (media-access-controller) blocks back up these SERDES blocks. Programmable I/O pins, some of which can serve as configurable, high-speed DRAM ports, fill the other two edges of the die.
The fabric covering the interior of the die includes alternating rows of six kinds of configurable blocks. The most familiar of these, configurable-logic blocks, use a conventional four-input-look-up-table architecture. Rows of 1-GHz octal ALUs perform computational or statistical operations on packet contents. Packet-processing blocks parse headers and extract payloads at 800 MHz.
In support of these blocks, the chip provides rows of 1-GHz specialized memory blocks that you can configure as RAM, primary CAM, or tertiary CAMs for buffering, address mapping, pattern searching, or even, with clever use of the other blocks, general-expression processing. The chip also has rows of conventional single- and dual-port RAMs. The application-specific architecture provides smaller blocks of dual-port RAM, assuming that they will act as interblock-buffer memories, and larger blocks of single-port RAM for parameter and packet storage.
Interconnect for the chip differs dramatically from that for general-purpose FPGAs. Because designers can express most data-plane-packet processing as data-flow architectures, Cswitch eliminates the use of the elaborate mesh of varying-length, varying-orientation interconnect segments typical of an FPGA in favor of simple, nearest neighbor, orthogonal routing. These short segments are fast and 20 bits wide, and you can subdivide them into groups of 5 bits. Each terminates in a registered, fully populated crossbar switch that connects the interconnect segments into the logic fabric and to each other. Thus a flow-through datapath design that uses nearest neighbor interconnection becomes a fully registered pipeline. This approach allows the chip to receive, edit, classify, and store packets at a 1-GHz rate, according to Laird. Designs that require less orderly interconnect must daisy-chain signals through segments and crossbars, resulting in longer—but highly predictable—interconnect delays.
The utility of such a design depends on its tools. for which Cswitch has partnered with Magma Design Automation, establishing a design flow that incorporates Blast Create and Blast FPGA tools, along with application-specific libraries and Cswitch-specific mapping and timing files. Designs for implementation in the Cswitch chip would typically combine complex library functions, explicit instances of the various Cswitch configurable blocks, and Verilog. Magma is inferring Cswitch structures directly from Verilog, according to Magma product director, Sanjay Bali, but only for relatively obvious cases, such as mapping combinatorial logic onto the logic blocks and multiplications onto the ALUs.
So far, Cswitch has done test chips of the SERDES blocks with its foundry, Chartered Semiconductor, in its 90-nm CMOS process. Laird expects a full tape-out of the company’s high-end chip in September, followed by a first release of the design tools, with sampling to follow.