Low-cost programmable logic: How low should you go?

Brian Dipert - March 16, 2000

If you believed everything the silicon manufacturers say, you'd all be designing with 1 million-gate FPGAs, 256-Mbit DRAMs, 64-Mbit flash memories, and 800-MHz processors. All it takes to come back to reality, however, is a glance at the discussion traffic on electronics-focused Internet newsgroups or the download statistics from EDN Access or other Web sites that deliver electronics documentation. Some of the most requested data sheets are those for 22V10 PALs, 256-kbit SRAMs, 1-Mbit EPROMs, and 8-bit microcontrollers. Few engineers work at the leading edge of technology. For most, products from last year's press releases are adequate and represent a safer, more cost-effective design path.

This article discusses programmable-logic devices and conducts a cost-versus-feature comparison for each vendor that offers both low-cost and enhanced-feature PLDs and FPGAs. For PLDs, the common comparison point is macrocell count. The FPGA comparisons occur mostly at a common logic-cell count, a task complicated in some cases by the logic-cell differences among architectures. However, this article avoids comparing gate counts whenever possible, because on-chip resources other than logic, such as embedded memory, affect this often-nebulous specification, which also greatly depends on your FPGA application (Reference 1).

Common ground

Having first warned you not to generalize one vendor's low-cost and enhanced-feature differentiation to all programmable-logic suppliers, I'll now contradict myself by discussing some of the frequently encountered similarities in cost-versus-feature trade-offs. However, make sure you also read through the vendor-specific sections that follow to get a complete picture.

Manufacturers frequently employ proven, high-volume manufacturing processes to construct their low-cost devices instead of the leading-edge lithographies that their premium product lines use (Reference 2). This decision results in several trade-offs. Low-cost product families often don't extend to the largest logic capacities available with feature-rich devices, though they frequently
extend below the size of the smallest premium parts. As a result, you don't have to pay for logic resources you don't need with moderately sized designs. A similar analogy holds true for embedded memory: Cost-conscious architectures often omit it or restrict its size and function, but if you don't need it, why pay for it?

Low-cost devices typically run at a 5 or a 3.3V core voltage versus 2.5V or lower for parts based on advanced processes. Don't automatically assume that a lower voltage part is also lower power; the higher current draw of the more complex device's additional transistor count can swing instantaneous power results in favor of the less advanced low-cost device. You should also calculate time-averaged power and time-multiplied energy consumption to gain a complete picture (Reference 3). Plus, if the rest of your system runs at more conventional voltages, a less advanced PLD will save you the cost and hassle of putting a separate low-voltage supply and power traces on the board.

Cost-conscious product lines often offer a restricted number and a standard set of packages, including PLCCs and QFPs, but you probably won't see ultra-fine-pitch versions, high-pin-count BGAs, or heat-tolerant and radiation-resistant ceramic variations. Speaking of pin count, the I/O buffers in low-cost PLDs are also often less flexible—in output-voltage options, input-voltage tolerances, differentially driven capability, advanced protocol support, and other respects—than those on premium parts.

Low-cost devices that operate at speeds comparable with those of highest performance premium devices are rare. Even if the ordering specifications look the same, a premium part frequently delivers lower macrocell or logic-block propagation delays and higher register-toggle speeds (Reference 4). This fact reflects both the trailing-edge processes manufacturers use to fabricate low-cost parts and the manufacturers' desire to maximize yield on those parts to all speeds. Again, if your design has moderate performance needs, the availability of lower speed, low-cost products keeps you from spending extra money for performance you don't require.

Speed isn't the only specification that vendors relax with a low-cost PLD or FPGA; these parts sometimes burn more power than their premium counterparts, again to maximize manufacturing yield. Reduced amounts of on-chip generic and feature-specific routing resources reflect a process with fewer metal layers and a wider metal pitch than the leading-edge alternative. As a result, using low-cost architectures may complicate your ability to fit your design into a part and hit desired performance targets. Finally, a low-cost architecture often lacks specialty circuits, such as digital delay-locked loops (DLLs) or PLLs, arithmetic gates, and the like. Although certain designs take full advantage of these circuits' presence, many applications can't use them, and, even if they could, the design tools or methodology you employ might not let you access them (references 5 and 6).

Although I contacted all the programmable-logic manufacturers for this article, not all of them had products that fit into the "low-cost" and "feature-rich" categories. For example, Atmel offers two FPGA product lines, the AT6000 and AT40K families, but forcing them into these categories is inappropriate. Rather, the families have features targeting specific applications. Another example is the Lucent Technologies Orca3 FPGA family, which has more features and costs less than the previous generation Orca2 devices. Also, both Atmel and Lucent are evolving their product lines beyond pure programmable-logic devices to hybrid ASIC-plus-FPGA architectures (Reference 7).

A few other comments: This article briefly covers devices that represent the vendors' product line intentions, not necessarily devices they are currently shipping. Contact the manufacturers for more accurate availability information. Verify the prices in this article with vendor salespeople or distributor contacts. For more information on the PLDs and FPGAs mentioned here, search EDN's past issues and see Table 1 below:
Actel presented the most difficult comparison challenge of any of the vendors mentioned in this article, and, in this case, I was forced to match up low-cost and premium FPGA product lines based on gate count. The company's current mainstream family is the 2.5V, antifuse-based SX-A; earlier generation SX and MX devices satisfy 3.3 and 5V design requirements. The SX-A family has 12,000 to 108,000 "system" gates, 8000 to 72,000 "typical" gates, and 768 to 6036 logic modules.

Actel's premium FPGA architecture, the Gatefield-developed, 2.5V ProASIC 500K family, incorporates no antifuses. It uses flash memory as its configuration technology and a finer grained, reduced-input, fan-in logic-cell "tile" structure; tile count ranges from 5376 to 51,200 (Figure 1). ProASIC 500K also offers 14- to 138-kbit embedded-memory arrays; these figures inflate the gate-count specification, making it higher than that for a comparable amount of logic capacity in the SX-A. Maximum system gates range from 98,000 to 1.1 million, and corresponding typical gate count is 43,000 to 410,000.

With those differences in mind, Actel reports that the flash-based 500K130, a device with perhaps 5% more logic capacity than the antifuse-based A54SX72A, costs approximately 65% more than the SX-A FPGA at comparable speeds, packaging, and order quantities. Both antifuse and flash technologies are nonvolatile and, therefore, single-chip, requiring no configuration PROM, and are also low-power. Flash memory, however, is user-programmable and on-board-reconfigurable, although implementing onboard reconfigurability is difficult.

The ProASIC FPGA fine-grained logic and routing structures are also ASIC-like, providing a more straightforward ASIC-prototyping platform and an easier programmable-logic learning curve for ASIC designers. Embedded RAM is useful for integrating FIFO memories and other memory-based circuits, and the higher ProASIC maximum capacity gives your designs expansion headroom they would lack if you implemented them in SX-A FPGAs.

**Altera tailors its terminology**
A few words on jargon: What Altera calls product-term-based programmable-logic devices, the rest of the industry calls complex PLDs (CPLDs). What Altera calls look-up-table-based programmable-logic devices (the result of a nasty legal battle with Xilinx a few years ago), the rest of the industry calls FPGAs. This article uses the more common "CPLD" and "FPGA" to simplify your comparisons of Altera with other vendors. Historically, the Max7000 family represented Altera's mainstream CPLD architecture, with the segmented-array premium Max9000 family having the most macrocells. However, Altera is winding down its marketing of the Max9000 line, and the Max7000 family's 512-macrocell maximum nearly matches that of the largest Max9000 alternative. As the Max7000 family takes over the premium-device mantle, the new Max3000A family replaces it at the low end.

Max3000A devices come with a similar number of macrocells—32 to 256—and identical speeds to their Max7000 equivalents. In fact, Altera builds the Max3000A and Max7000 parts from the same die, although dedicated Max3000A silicon will appear in the future. However, each Max3000A consumes 30% more power than its same-size and -speed Max7000 equivalent, and Max3000A CPLDs also lack the Max7000 architecture's fast input registers.

Altera claims that Max3000A devices cost an average of 30 to 40% less than their Max7000 counterparts. The Max3000A product family also offers quantity-independent prices, such as $1, regardless of order size for the Max3032A, and the company offers a similar strategy only on the Max7032A. Max3000A devices have fewer packaging options and I/O pins than Max7000 equivalents, replacing the pins with more power and ground inputs to accommodate the devices' higher power consumption.

Now for FPGAs. Altera no longer promotes—but continues to support designs incorporating—its first-generation, look-up-table-based Flex 8000 programmable-logic architecture. The introduction of Altera's Apex 20K products has relegated the company's former premier FPGA line, the Flex 10K family, to midrange status. And for lowest cost, Altera offers the Flex 6000 architecture. How do you sort out all these product names?

All three families share a common logic-cell structure, simplifying comparison. Flex 6000 lacks the Flex 10K family's ceramic-packaging options, on-chip PLL and embedded-array blocks, thereby providing no on-chip memory, and shifts the routing mix from global- to local-logic block-to-block and block-to-I/O-buffer direct interconnect (Figure 2). Altera reports that Flex 6000 sells for 50 to 65% less than comparable members of the Flex 10K family. Conversely, Apex 20K's embedded-array blocks are more flexible than those on Flex 10K, supporting content-addressable memory (CAM) and limited product-term logic functions in addition to the more common FIFO and multiport-RAM configurations. Apex 20K also delivers multiple on-chip PLLs, a higher memory-bit-to-logic-block ratio, and a more flexible I/O-buffer configuration than does the Flex 10K.

Altera's "heir apparent" to the low-cost FPGA mantle is the AceX architecture, which will become available for sampling this month, according to the vendor. Like Xilinx with Spartan, Altera will base AceX on smaller lithography, lower cost versions of Flex 10K and Apex 20K versus the unique architecture that Flex 6000 represents. Initial AceX parts will run at 2.5V on a hybrid 0.25/0.18-µm process, and 1.8V, pure-0.18-µm-based versions will follow this year. The first-generation Ace line features four devices having 576 ($3.50, 250,000) to 4992 logic cells, 12 to 48 kbits of embedded RAM, and a PLL, all making the devices sound a lot like a smaller lithography Flex 10KE. Altera predicts that 2.5V Ace FPGAs will cost approximately 40 to 60% less than other 2.5V Altera look-up-table-based programmable-logic offering.

Cypress: radical differences

Cypress Semiconductor's Ultra37000 and Delta39K CPLD product lines represent a case study in
contrasts. The EEROM-based Ultra37000, the more traditional of the two, has 32 to 512 macrocells; as many as 16 product terms per macrocell with no speed penalty; and a monolithic, highly routable logic-block-to-logic-block interconnect. The parts run at 5 and 3.3V and come in a variety of packages, including fine-pitch BGA.

The segmented, SRAM-based Delta39K, on the other hand, runs at 1.8V, with onboard regulators that also accept an externally supplied 3.3 or 2.5V. The devices have 256 to 5376 macrocells and use the memory not only for device configuration but also for embedded arrays. Cypress places SRAM not only within each logic-block cluster (two 8192-bit arrays) but also between them (4096 bits per logic-block cluster), which multiple clusters can access for dual-port configurations (Figure 3). Total embedded-RAM densities range from 40 to 840 kbits.

Delta39K devices also integrate a PLL, support a number of I/O-bus protocols, and include an optional separate flash memory for configuration storage in a multidie, single-package configuration. Surprisingly, Cypress claims that, at equivalent macrocell counts, Ultra37000 and Delta39K devices will be comparably priced, reflecting the Delta39K's more cost-effective, pure-logic fabrication technology. Delta39K parts aren't yet available for sampling, so Cypress' availability and pricing projections are subject to change, but if those plans hold up, Delta39K will be a bargain if your design can take advantage of the logic capacity, embedded memory, and other features.

"Peel"-ing away a strategy

Low-cost-versus-enhanced-feature differentiation isn't restricted to high-logic-capacity CPLDs and FPGAs. International CMOS Technology (ICT) offers two enhanced product lines that drop into the pinout of a conventional 22V10 PAL, which the manufacturer also supplies. The 5V or 3.3V Peel 22V10AZ device doesn't match the speed of the fastest conventional 22V10 versions, but its "sleep" capability significantly lowers power consumption in the absence of input transitions. Peel devices also have more than twice the product terms, triple the I/O-macro configurations, and 25% more on-chip registers than a conventional 22V10. The Peel devices also include Schmitt-triggered inputs for deglitching in noisy system environments. They cost approximately 50% more than a conventional 22V10, according to ICT.

For the ultimate in logic capacity, however, choose a 7024 Peel array (Figure 4). This chip contains four times more on-chip registers, including 20 I/O registers, but it also costs 2.5 to three times more than a 22V10. The full PLA interconnect means you have no routing constraints, and the part also offers numerous clocking options. ICT claims that the price will decrease to less than twice that of a 22V10 when the company moves this year to a smaller lithography. Members of the Peel array family having as many as 60 registers are also available.

Spring cleaning, Lattice style

It's been nearly a year now since Lattice acquired AMD's Vantis subsidiary. The company has since sorted through the resulting multiple partially overlapping product lines and, although it will continue to support all existing customers on all architectures, has figured out which CPLDs it will market in the future. The company will promote two mainstream architectures: the 2.5, 3.3, and 5V ispLSI 2000 family and the 3.3 and 5V Mach 4A family. Lattice also offers two premium lines: the 3.3V ispLSI 5000 and 5V (with 3.3V planned) ispLSI 8000 product families.

Why do two mainstream product lines have comparable pricing? Both ispLSI 2000 and Mach 4A have 32 to 192 macrocells, and Mach 4A extends beyond this range to 512 macrocells. The Mach 4A family offers predictable speeds through 18 allocated product terms per macrocell; ispLSI speeds degrade beyond four product terms due to rerouting-logic delays. However, with simple product-
term configurations, ispLSI 2000 parts are faster than Mach 4A counterparts.

Lattice calls its ispLSI 5000 family the SuperWide architecture, reflective of the 68 inputs allocated to each 32-macrocell logic block (Figure 5). Macrocell counts range from 256 to 512, and, at sizes comparable with mainstream alternatives, ispLSI 5000 parts are approximately 20% more expensive, according to Lattice. The ispLSI 8000 SuperBig family has a narrower input fan-in, but its macrocell count begins at 840. Per-macrocell prices for the ispLSI 8000 will be comparable with those of the ispLSI 5000 family, with equivalent packaging, speeds, and order quantities.

QuickLogic improves its memory

When QuickLogic added embedded memory to its pASIC3 FPGAs, it renamed them the QuickRAM family, the first in a series of embedded-standard-product (ESP) offerings that has grown to include hybrid chips with ASIC-housed PCI cores and arithmetic units. The QuickRAM family spans five devices with 160 to 1584 logic cells, corresponding to 9000 to 90,000 PLD gates and 9216 to 25,344 bits of RAM (Figure 6).

In comparison, the pASIC 3 family trades QuickRAM's smallest 160-logic-cell device for an even smaller 96-logic-cell part. In most other respects—operating voltages, logic speeds, packaging options, and other features—the two product families are nearly identical. Within each product family, clock and control resources increase as logic-cell count grows. At equivalent logic-cell counts, QuickRAM FPGAs are approximately 30% more expensive than their pASIC 3 equivalents, according to QuickLogic.

Xilinx cools down

The 5 and 3.3V XL9500 product lines represent Xilinx's mainstream CPLD-device families. Xilinx announced 2.5V XL9500 variants last year and continues to advertise the parts on its Web site but has withdrawn them for rebuilding; the company plans to again unveil the product this year. Macrocell counts are 36 to 288.

CoolRunner XPLA3, a product line Xilinx acquired from Philips last year, is Xilinx's premium CPLD product family. Plans include extending macrocell counts to 384. XPLA3, like ICT's Peel array devices, incorporates a full PLA structure ahead of each logic block's macrocells. Also like ICT, XPLA3 replaces the traditional PLD sense amp with CMOS logic for very low standby-power consumption and no sleep-mode-wake-up-delay penalties. Xilinx reports that, at equivalent macrocell counts, CoolRunner XPLA3 devices cost 15% more than their XC9500 equivalents.

Like primary competitor Altera, Xilinx's mainstream FPGA history is somewhat convoluted. Xilinx's first attempt at a low-cost architecture, the XC5200 family, included no embedded-memory capability (similar to Altera's Flex 6000). The company based its next attempt, the 5V Spartan, followed by the 3.3V Spartan-XL, on the XC4000E FPGA architecture, which meant that each logic block's look-up table could also function as small distributed-RAM elements. However, the on-chip routing resources were more limited than those in the then-more-advanced XC4000XL. Also, you couldn't configure the part via the parallel-interface option, and Xilinx deleted a few other minor features.

XC4000XL and its XLA and XV derivatives are no longer at the high end of Xilinx's product line, however. The company replaced them with newer Virtex and Virtex-E architectures. Xilinx has also advanced the low-cost Spartan brand, and the latest iteration, Spartan-II, is nearly identical to Virtex; it incorporates both discrete and distributed RAM capability and on-chip DLLs. How do 2.5V Spartan-II and 1.8V Virtex-E compare? With Virtex-E, you get much larger logic and memory
capacities, higher logic and I/O-buffer performance, a significantly higher ratio of memory-to-logic cells at the upper end of the product line, and twice the number of DLLs, all thanks to the advanced 0.18-µm manufacturing process (Figure 7).

The Virtex-E family ranges from 1728 to 15,552 logic cells, with corresponding block-RAM sizes of 64 to 288 kbits and system-gate counts of 71,693 to 985,882, including memory. Spartan-II devices have 432 to 3888 logic cells; corresponding block-RAM sizes are 16 to 48 kbits, and system gate counts are 15,000 to 150,000. Virtex-E also provides you with greater I/O-buffer-configuration flexibility and more packaging options. For all these improvements, Xilinx asks you to pay 10 to 15% more for Virtex-E than for Spartan-II with comparable logic-cell sizes, speeds, and packaging. If, however, you compare the slowest Spartan-II option with the slowest Virtex-E equivalent, you’d pay roughly 30% more; this increase reflects Spartan-II’s lower speed options and results in a potential cost savings if you have moderate design performance requirements.

Author info

Contact Technical Editor Brian Dipert at 1-916-454-5242, fax 1-530-937-8147, bdipert@pacbell.net.

REFERENCE