Design Feature:July 20, 1995
Designing for speed with a PLD depends on what you need to accomplish. PLDs come in many flavors: complex PLDs (CPLDs), field-programmable gate arrays (FPGAs), generic-array-logic (GAL) devices, and programmable-array-logic (PAL) devices. All these devices use a different means of enhancing speed. Propagation delays and benchmarks can help you make a choice. However, you must be aware of not only the devices' operating speeds but also their time to market. Easy, manipulative software can quickly get your design working. As speed increases, so do high-frequency side-effects on the pc board on which the PLD resides, however.
PALs, the simplest PLDs, offer predictable propagation delays from an input to an output. GALs are electrically erasable GALs. In PALs and GALs, dedicated inputs drive programmable AND arrays, which, in turn, drive programmable macrocells. PALs have limited density, however. For example, the popular 22V10 PAL, available from many sources, has only 10 macrocells. If you want to use multiple PALs in a large design, you must run the signal lines into and out of each device, which causes high pad capacitance and long board-trace delays, thus slowing performance. Vendors of PALs and GALs include Advanced Micro Devices (AMD), Atmel, Cypress Semiconductor, and Lattice.
As the least dense PLDs, PALs and GALs typically offer the shortest propagation delays because the interconnect is smaller in a PAL or a GAL than in a higher density device. For example, AMD's and Lattice's 22V10s, including Lattice's GAL22V-10-5 ($13 (1000)) have a propagation delay as low as 5 nsec between input and output. To contain a high-density design in a single device, consider a higher density CPLD or FPGA. You may even want to perform the higher speed functions on a 22V10 and the lower speed functions on a CPLD or FPGA. With CPLDs and FPGAs, however, one size does not fit all.
CPLDs comprise a single chip with multiple PALs that interconnect on-chip by a fast switching matrix. FPGAs, finer grained structures than CPLDs, comprise universal logic elements in an interconnection framework that often resembles a grid. FPGAs typically have short, routable interconnection lines that join logic elements having granularities ranging from a few gates to medium-sized macrocells.
The Cypress CYC371 CPLD, a member of the Flash370 family, has 32 macrocells in two logic blocks. Each logic block contains 16 macrocells, a 72x86 product-term array, and an intelligent product-term allocator. The logic blocks connect via the Programmable Interconnect Matrix (PIM), a fast, predictable routing resource. The $4.35 (10,000) CYC371 specifies an 8.5-nsec propagation delay between input and output. The specification is only for one macrocell, however. Using additional macrocells slows operating speed. The $180 (100) Xilinx XC-73144-7 EPLD integrates 16 22V10 PALs and has a universal interconnect matrix that provides access to all of the logic resources. The 144-macrocell device specifies a 7.5-nsec propagation delay from input to output for a single macrocell and 24-mA output drive current, which makes it Peripheral Component Interconnect (PCI)-compliant.
The CPLD-vs-FPGA trade-off
CPLDs are typically faster and have more predictable timing than FPGAs have. However, FPGAs are generally more dense and contain more flip-flops and registers than do CPLDs. Operating speed generally decreases as the PLD density increases, however. Your design may fit into a CPLD; if not, you must use an FPGA--a switch that involves a mental shift. With CPLDs, you try to fit as much logic as possible into a macrocell to take advantage of the CPLD's predictability. When you design with an FPGA, on the other hand, the devices' large fan-outs from one logic cell to another can cause long delays. Placing two buffers at the output of the logic cell lets you split the fan-out to decrease the overall delay even after adding the buffer to the output.
Altera's EPROM-based Max 7000 family of CPLDs has members with pin-to-pin propagation delays as low as 6 nsec. The maximum density is 5000 usable gates.
Because of their interactive networks, FPGAs perform comparative functions better than do CPLDs. Because they have many registers and flip-flops, FPGAs also work better in synchronous logic designs or in data-path functions. In contrast, CPLDs usually perform better in large-state-machine designs in which propagation time is important. Other important issues include whether your design is combinatorial, requires fast clock-to-output time, or requires fast-output enables. Setup time is also important, especially if your design communicates with a µP, and the µP doesn't provide the data early in the cycle.
Xilinx's PCI-compliant XC73144 erasable PLD has a density of 3500 usable gates and a propagation delay of 7.5 nsec.
Other considerations in weighing CPLDs against FPGAs are how long it takes to complete a design and get it to market, how many chips the board has, and the board's manufacturability. System designers must consider these factors and their cost. On a manufacturing and cost basis, it may be better to use a higher speed FPGA or CPLD because these devices allow communication with lower speed logic on the board.
If you intend to convert your design prototype to an ASIC, consider the QYH500 family of laser-programmable gate arrays (LPGAs) from Chip Express. These devices combine the performance and density of a traditional gate array with the turnaround time of an FPGA. The LPGAs feature a 0.3 Ohm metal-to-metal, laser-made on-resistance that connects rows to columns. The company characterizes the delays of the interconnect wires so that you can accurately predict the speed of the device. You can submit a netlist on one day and receive two tested, laser-cut prototypes on the next day. Standard NRE charges for prototyping services, which include two fully tested parts, start at $12,500. Production devices of 20,000-gate versions in 128-pin QFPs cost $21 (100,000).
Ample routing wires and ViaLink fuses allow QuickLogic's place-and-route tool, SpDE 5.0, to guarantee the hookup to this PCI bus application using the 8000-gate QL24 x 32B FPGA.
Carefully choose architecture
Deciding which part best fits which application is not a clear-cut decision. Cypress also offers the $9.45 (10,000) CYC381 and $9.95 (10,000) CYC382 FPGAs, which have 1000 usable gates. The devices specify a 6.5-nsec input plus logic-cell plus output delay and a 120-MHz chip-to-chip speed. The FPGAs operate with the same software as the company's Warp 3 Flash370 CPLD family and with VHDL, which lets you see whether a CPLD or an FPGA best fits your design. Using the same software saves the time of your learning a new language. Xilinx also offers a variety of FPGAs, including the PCI-compliant XC3100A family. An XC3195-3 has 7500 usable gates and costs $138.85 (100). You program the part using XACT software, part of the Xilinx Alliance Program, which supports more than 50 third-party software vendors.
If your completed design is still 1 or 2 nsec too slow, define critical paths to determine what is causing the delays. For example, you can split an FPGA's output buffer; hand place and route to get the timing to work fast enough; or simply buy a faster part. Selecting another part can mean using a device that uses metal fuse links or EPROM- or SRAM-based cells for interconnects. Cypress's FPGAs use a metal-to-metal fuse, and the company's CPLDs use a flash EPROM cell.
Actel and QuickLogic also use metal-to-metal fuse links. Altera and Xilinx use SRAM-based cells. The QuickLogic metal-to-metal ViaLink antifuse achieves a 50 Ohm on-resistance. This on-resistance is significantly lower than those of the 200 Ohm Plice metal-to-metal antifuse Actel uses or the approximately 1000 Ohm forward on-resistance that SRAM cells exhibit. Consequently, QuickLogic claims to have the fastest--although not the densest--FPGAs. For example, the QL24x32B has a maximum of 8000 usable gates and costs $78.90 in a 144-pin TQFP. The QuickLogic Wildcat family boasts a less-than-6-nsec input plus logic-cell plus output delay, an up to 110-MHz chip-to-chip operating speed, and a 150-MHz internal state-machine frequency. The devices have high current drivers and two dedicated clock networks to minimize fan-out problems. QuickLogic's Quick Tool software scans a design and automatically identifies where buffers will increase fan-out.
The Cypress CYC371 CPLD has two logic blocks, each containing 16 macrocells, which connect via a fast Programmable Interconnect Matrix (PIM) that provides predictable 8.5-nsec propagation delays.
With Quick Tool, you type the critical paths, and SpDE place-and-route tools automatically link fragments of logic so that it has few interconnect delay and short critical paths. The ViaLink architecture is one-time-programmable and not reprogrammable as SRAM and EPROM structures are. QuickLogic and other PLD vendors have partnered with Synplicity, which develops libraries of parametrized modules (LPMs). Under these agreements, the PLD vendors submit descriptions of the logic modules for their families to Synplicity, which adds the description to its library. Any synthesis tool can then use the LPMs for entering logic for a vendor's device. This approach avoids the VHDL method of treating each building block as a simple gate.
Also an LPM user, Altera manufactures the EEPROM-based Max 7000 CPLD and the SRAM-based Flex 8000 FPGA families, which use a fast-track, continuous-routing structure to interconnect rows and columns. The $1.95 (100,000) EPM7032, a 32-macrocell device, specifies a 6-nsec propagation delay from an input to an output going through a macrocell. You can fit much logic into one macrocell, however. For example, address decoding fits into one macrocell and experiences only one macrocell delay. Although propagation delay is an important parameter for speed, benchmarks provide an additional measure of a device's performance. When comparing Programmable Electronics Performance (PREP) benchmarks, make sure to compare FPGAs with FPGAs and CPLDs with CPLDs (see box, "Using the PREP benchmarks").
Altera's Max+Plus II software offers some help in choosing a PLD. The software includes a timing-driven compilation tool that lets you enter the maximum operating speed, the clock- to-output delay, the setup times, the output-enable times, and other critical timing parameters. The software then automatically identifies the Altera device that meets your specifications. The software handles entries from schematics as well as entries from a VHDL or Verilog hardware-description language.
According to PREP benchmarks, SRAM-based products can perform as fast as ViaLink devices in some tests. Vendors having PREP tests tailor their designs to the test at hand. They give no consideration to the pinouts corresponding to an actual board layout, which would slow the device. Nevertheless, the PREP benchmarks for the SRAM-based Altera Flex 8000 series show that it is almost as fast as the ViaLink-based QuickLogic devices. According to Altera, the Flex 8000's continuous interconnect scheme is the reason for its speed. If a device from one corner of an Altera chip must communicate with a device on another corner of the chip, the first device needs to make only three SRAM-based connections. One connection links a corner cell to a row, one connects that row to a column, and one connects that column to the other corner device. QuickLogic's logic, in contrast, needs multiple segmented ViaLinks to connect devices separated by the same distance. An Altera EPF81188A has 12,000 usable gates and 1188 registers and costs $29 (10,000).
Actel is not a proponent of the PREP benchmarks because PREP uses a step-and-repeat method that doesn't represent real-world applications in which routing must occur all over the chip. Instead, Actel chooses instead to display some real-world examples using its Act 3 FPGA family. For example, you can use an A1460A chip to design an integrated four-DRAM controller operating faster than 40 MHz, a A1425A-3 chip to design a 250-MHz serial communications port, a A1460A chip to design a 1000-Mbps Ethernet receiver, and a A1425A chip to design a 100-MHz concentrator for document retrieval. The Act 3 family operates as fast as 250 MHz on-chip, has a clock-to-output speed as low as 7.5 nsec, has as many as 10,000 gates, and has input-setup times as low as 1.3 nsec.
| Using the PREP benchmarks |
|---|
|
The goal of Programmable Electronics Performance (PREP) Corp, a consortium of programmable-logic companies, is to further the awareness, understanding, and use of high-capacity PLDs to benefit system designers. PREP achieves this goal by using a benchmark suite that measures PLDs' performance and logic capacity.
PREP selected a suite of nine benchmark circuits representing a wide range of logic types to exercise most PLD architectures. The benchmarks test data paths, timer/counters, small state machines, large state machines, arithmetic, 16-bit accumulators, 16-bit counters, 16-bit synchronous prescaled counters, and memory maps. To submit a device, a vendor must submit data for all nine benchmarks using automatic placement-and-routing or manual techniques. A vendor can submit two versions of the benchmarks: one for capacity and the other for performance. To measure the capacity of a device, PREP implements each benchmark as many times as possible in the device using a step-and-repeat method. The number of benchmark instances a device can implement represents the device's capacity. The average benchmark speed is a simple average of the speed of the internal logic between benchmark instances within a device (FMAX-MEAN) and the external speed (FEXT), the speed that one device can communicate with another device in a system. The benchmarks report the worst, the best, and the mean values of FMAX. For more details about the PREP benchmarks, write to PREP Corp, 504 Nino Ave, Los Gatos, CA 95032; e-mail: PrepTalk@netcom.com. |
Nobody's perfect
Making mistakes and changing designs can affect your pinout allocation, which may require a complete re-layout of the board. You can avoid this problem by choosing an AMD Mach 3 or 4 CPLD. The Mach devices have an output-switch matrix, which connects an I/O cell to any of the macrocells on the chip. The switching matrix adds no overall delay to the chip and allows for design changes without pinout changes. The propagation delay of a Mach device is as low as 7.5 nsec. The in-circuit programmability of the Lattice CPLDs also lets you make a mistake and correct it without having to change the board's layout. The 32-macrocell Lattice ispLSI 2032 ($18 (1000)) has a 5.5-nsec propagation delay, a 154-MHz maximum operating frequency, and high PREP benchmarks.
High speed and high density bring with them high-frequency noise. Designers must be aware of ground-bounce effects, crosstalk, EMI, and transmission-line characteristics on the traces of the board that carry these high-speed signals. Ensure good capacitive decoupling of the power supplies and adhere to good grounding practices. Terminate transmission lines to prevent ringing. Some experienced PLD designers build their own macrocells from macro-library functions, such as a multiplexer or a look-up table, and add sufficient gates to create custom macrocells. This approach gives them more control over the final design of the product.
| Looking ahead |
|---|
| PLD vendors are leapfrogging each other in performance and capacity, and there's no end in sight. Most vendors use 0.6-µm or larger technology and haven't yet dug into the deep-submicron area, as ASIC vendors have. When PLD vendors do move into deep-submicron technology, however, another jump in speed and capacity will occur. In the meantime, the metal-to-metal-vs-SRAM-based-link controversy may settle. Xilinx plans to announce a metal-to-metal-link field-programmable gate-array family this year. |
Reference
| For more information... | ||
|---|---|---|
|
|
||
| Actel Corp Sunnyvale, CA (408) 739-1010 |
Advanced Micro Devices Sunnyvale, CA (408) 732-2400 |
Altera Corp San Jose, CA (408) 894-7000 |
| Atmel Corp San Jose, CA (408) 436-4243 |
AT&T Allentown, PA (215) 439-6011 |
Chip Express Santa Clara, CA (408) 988-2445 |
| Cypress Semiconductor San Jose, CA (408) 943-2600 |
Lattice Semiconductor Hillsboro, OR (503) 681-0118 |
QuickLogic Santa Clara, CA (408) 987-2000 |
| Synplicity Inc Los Altos, CA (415) 961-4962 |
Xilinx Inc San Jose, CA (408) 559-7778 |
|