Feature
FPGAs "DiSP"lay their processing prowess
Does the performance-power-price product of your software-centric approach no longer compute? Do you need a nimbler platform than a hard-wired ASIC can provide? Programmable logic may be your answer, but carefully calculate the trade-offs to correctly solve your problem.
By Brian Dipert, Technical Editor -- EDN, 10/3/2002
|

Mention "DSP" in conversation around the water cooler, and what vision will pop into other folks' heads? Probably a picture of a piece of silicon from a company such as Analog Devices, Motorola, or Texas Instruments. This image isn't necessarily wrong, mind you, but it's a bit like (as the old saying goes) "putting the cart before the horse." First and foremost, DSP stands for digital-signal processing; that is, converting analog signals to the digital domain, arithmetically transforming them in some way and then translating them back to the analog domain for human sensory consumption.
Digital-signal processors from the aforementioned companies and many others are only one vehicle for implementing digital-signal-processing functions. The earliest DSP chips, after all, were little more than otherwise general-purpose CPUs with Harvard architectures (separate instruction and data buses and separate caches), befitting their algorithms' data-centric nature. As general-purpose CPUs have grown speedier and particularly as they've added onboard arithmetic coprocessors and signal-processing-optimized instruction-set extensions, they're increasingly taking over the calculation burden that might have formerly required a separate compute engine. (This article uses "digital-signal processing" to refer to the function and "DSP" to refer to the processor.)
On the other end of the hardware-versus-software-implementation spectrum are ASICs, housing hard-wired arithmetic logic blocks and state machines. Inflexible? Yes. Do you need to design the hardware yourself? Yes, unless you license predesigned IP (intellectual property), which you must still stitch together with the remainder of your chips' circuits. But fast, low-power, and inexpensive (the cost based on the amount of silicon consumed to construct the function)? Yes. The ASIC-based approach (note, again, we're talking about hardwired functions, not a processor core you've integrated into your ASIC) is particularly attractive when your end system will sell in high enough volumes to justify the NRE (nonrecurring-engineering) costs, when time-to-market isn't critical, and when standardization and design experience maximize your first-silicon-functional confidence and preclude the need for post-sale upgrades.
In attempting to simultaneously stay ahead of general-purpose CPUs and prevent you from jumping ship to hard-wired ASICs, the DSP vendors have evolved their high-end architectures into multicore VLIW (very-long-instruction-word) "engines" and have incorporated their own hardware-acceleration capabilities in the form of dedicated Viterbi decoders, matrix multipliers, and the like. Although signal-processing functions contain a great deal of parallelism, the incremental performance gain with each added engine is less than 100%, and multiprocessors are challenging to program. Hardware acceleration also tends to make the resultant DSP more application-specific and, therefore, more expensive than a general-purpose alternative.
Silicon foundationsDigital-signal processing, thanks to explosive growth in wired and wireless networks and in multimedia, represents one of the hottest areas in electronics. So it's no surprise that dozens if not hundreds of stand-alone-chip and embedded-core vendors are chasing after the business, representing both the software- and the hardware-centric implementation extremes. But at least one in-between option bears your consideration (see sidebar "Theme and variation"). Like software, a programmable-logic device is almost infinitely customizable, and, as with a processor, the silicon physical-design work is already done for you. FPGAs aren't quite as low-power, fast, or dense as ASICs, but they're superior to processors in those regards (see sidebar "Evaluating performance: FPGAs versus DSPs"). You can buy FPGAs, unlike ASICs, in small quantities with no upfront NRE charges, and you need not wait for months' worth of fab, packaging, and test delays after your design's done to obtain a working chip.
FPGA manufacturers have for years now been trumpeting their chips' ability to implement digital-signal processing, even before the emergence of low-latency carry-chain-routing lines that sped addition and subtraction operations spanning multiple logic blocks. The next significant improvement in FPGA arithmetic capability appeared with Atmel's AT40K architecture. An embedded AND gate within each logic block, working in concert with block-to-block diagonal routing lines, boosted performance when the chips were crunching array-multiplication calculations (Figure 1). Ironically, however, AT40K provides no carry chains.
Atmel's FPGAs are partially reprogrammable. (Lattice's ORCA line and Xilinx's Virtex devices are also reprogrammable, but their development tools do not neatly expose the silicon potential.) Theoretically, this capability means that you could dynamically time-swap various logic engines into a common silicon fabric. Pragmatically, the more likely scenario involves your ability to, for example, optimize an imaging filter's co-efficients in a digital-still-camera or videocamera application as ambient-light conditions change or tweak processing parameters in response to varying communication-channel SNRs.
The now-dominant LUT (look-up-table)-plus-register-logic-block combination, in combination with fast carry chains, is relatively efficient when implementing addition and subtraction operations. It's not, however, optimal in cost, performance, and power for multiplication and division functions. As a result, Altera (with Stratix), QuickLogic (with QuickDSP, now renamed Eclipse Plus) and Xilinx (with Virtex-II and Virtex-II Pro) have all taken a page from the ASIC book of tricks and embedded dedicated multiplier-function blocks on-chip (Figure 2). Altera and QuickLogic move even further along the integration path, providing full-blown MACs (multiply-accumulators, see Table 1 for specifications and Reference 1 for prices). Altera calls its version the DSP block (Table 2); QuickLogic—the first of the three to take the dedicated-arithmetic-unit-integration plunge—refers to its configurable variant as an embedded computational unit (Table 3).
In examining Altera's and Xilinx's arithmetic structures, you might question why the companies chose nonstandard 18-bit data inputs versus the more common 16-, 24-, and 32-bit lengths. One answer is that they wanted to match the bus widths of the FPGA's parity-inclusive embedded-RAM blocks. But a more general answer also exists, and it leads to a subtle but powerful FPGA strength. A signal-processing function rarely demands exactly 16-, 24-, or 32-bit precision. If it requires less, your CPU or DSP is wasting pins, external memory, and bus bandwidth. If more, you have to implement performance-sapping multiple-pass algorithms.
With an FPGA, particularly if you use general-purpose LUT and register structures, you can implement exactly the data precision you need to do the job—at least from a logic standpoint. Memory is the only remaining wrinkle; both the large embedded-memory blocks and the external memories come in predefined bus widths. FPGAs that can alternatively employ LUTs not only for logic functions, but also as small RAM arrays, such as Xilinx's various product families and Lattice's ORCA chips, are helpful here, because they give you density and bus-width flexibility to supplement or replace dedicated RAM arrays. LUTs have another valuable function in calculation-intensive signal processing: They are a more efficient alternative to registers for storing intermediary values.
Design contortionsAlas, the semiconductor graveyard is littered with the bones of great FPGA hardware ideas that went that by the wayside because of inadequate design-software support. How can you ensure that your signal-processing algorithms take full advantage of the power, performance, and efficiency of the embedded circuitry within these advanced chips? The answer depends at least to some extent on your design background.
If you're used to creating hard-wired circuits in ASICs, you'll be treading the easier of the two paths to FPGA nirvana. In a perfect world, the combination of a third-party synthesis compiler and the FPGA-vendor-provided place-and-route software should be able to automatically infer from your HDL code where to use specialized multipliers and MACs, much as they should do with embedded memory arrays and other on-chip function blocks (references 2 and 3). The vendors' documentation often makes recommendations on coding styles, which help guide the design software to the ideal end result (Reference 4).
In the real world, you still sometimes need to explicitly instantiate references to these and other device primitives in your HDL, a task that leaves your code less architecture- and device-generic, thereby contradicting a key motivation for using an HDL in the first place. The vendors have all developed pushbutton utilities that greatly simplify your creation of higher level functions, such as filters and transforms. Enter a few parameters, click "OK," and out comes a netlist "black box," which your HDL code references. At minimum, you have the option of creating a fully serial (to minimize pin count and logic usage) or fully parallel (to maximize performance) circuit. Depending on the size of your design and device and on your performance and power budgets, you might want to choose an in-between approach; in this case, make sure that the vendor's core generator tools support such flexibility.
Most of you, though, will probably be coming to FPGAs from a software background, and, ideally, you'd like to port your legacy code to hardware as straightforwardly as possible. In this case, the news is less optimistic. In fact, at least one in-the-know FPGA power user suggests that, unless a DSP has become a completely unpalatable option from a power, performance, or cost standpoint, you shouldn't bother taking even one step down the FPGA path (see sidebar "Consult with an expert"). Ironically, in developing your software in the first place, you've taken what was in all likelihood a highly parallelizable function and gone through a lot of work to convert it to a time-sequential serial algorithm in C or assembly code. Now, to port the algorithm to an FPGA, you need to reparallelize it and, sometimes, convert it from floating-point back to sufficiently-precise fixed-point arithmetic.
Companies such as Adelante Technologies (formerly, Frontier Design, with A|RT), Celoxica (with Handel-C), and Synopsys (the original developer of SystemC) all claim the ability to generate hardware designs from C code. And, to a degree, they all deliver on their claims. None of them, though, fully supports generic ANSI C code. (Pointers are particularly problematic.) Their languages are tailored to the task of hardware creation. So, although they might be acceptable for brand-new designs, they will be less help with your years' worth of existing software.
The designs created by C-to-hardware tools are less efficient than those you develop from the beginning in an HDL (an analogy to programming in C versus assembly language), but, thanks to the near-guaranteed performance boost in software-to-hardware conversion, lost efficiency may be a small concern. If you've developed your software algorithms in The MathWorks' Matlab, you might be in better luck. Both Altera and Xilinx offer tool sets that interface to Matlab and Simulink and output vendor-optimized hardware code and IP blocks. Xilinx also supports Cadence's SPW (Signal Processing Worksystem, Figure 3).
Intermediate approachesAlthough we're talking about digital- signal processing, you shouldn't assume that you've got at your disposal a "binary" set of implementation options—either fully software or fully hardware. The spectrum of choices is, in reality, far more "analog," reflecting a possible partitioning of tasks among both hardware for optimum speed, power consumption, or per-unit cost and software for ease of development and legacy compatibility (Figure 4). Both Atmel and Xilinx, for example, have, with their respective FPSLIC and Virtex-II Pro product lines, combined arithmetic-tuned programmable logic and "hard" CPU cores on a single device. If you develop your signal-processing algorithm in Matlab and Simulink, you can iteratively direct portions of it to C code and the remainder to FPGA hardware. Otherwise, don't waste too much time on tedious hardware-versus-software partitioning and repartitioning; focus your efforts on obvious software bottlenecks that benefit from FPGA acceleration.
Altera's ARM-based Excalibur chips and QuickLogic's QuickMIPS line do not include the hardware MACs in the vendors' respective Stratix and Eclipse Plus products. However, years' worth of successful design examples suggest that, although MACs might speed signal processing, they aren't an absolute requirement. Embedded memory blocks, for example, can also find use as multipliers. For the same reason, don't forget about Triscend's A7 and E5 chips. If you do need dedicated arithmetic circuits to hit your hardware-design targets, a "soft" CPU core might conversely provide you with sufficient software capabilities. Altera's Nios fits inside Stratix, and Xilinx's MicroBlaze works with Virtex-II and Virtex-II Pro. Supplementing Virtex-II's "hard" PowerPC cores with one or multiple "soft" MicroBlaze cores results in some interesting single-chip, multiprocessor-architecture possibilities.
Look, for example, at Altera's recently announced
DSP-development kit, an expanded version of last year's DSP Builder tool.
Working hand in hand with Matlab and Simulink, it enables you to develop custom
instruction-set and hardware-accelerated variants of the Nios processor that tap
into the MACs and other resources in the programmable-logic fabric. Altera
claims to have more than 60 DSP cores in its IP library, spanning specific
functions, such as encryption, error correction, image processing, and
modulation, as well as general-purpose functions, such as filters and
transforms.
| For more information... | ||
| When you contact any of the following manufacturers directly, please let them know you read about their products in EDN. |
||
| Altera 1-408-544-7000 www.altera.com | Atmel 1-408-441-0311 www.atmel.com | QuickLogic 1-408-990-4000 www.quicklogic.com |
| Triscend 1-650-968-8668 www.triscend.com | Xilinx 1-408-559-7778 www.xilinx.com | |
| OTHER COMPANIES MENTIONED IN THIS ARTICLE | ||
| 3DSP www.3dsp.com | Actel www.actel.com | Adelante Technologies www.adelantetechnologies.com |
| Analog Devices www.analog.com | Andraka Consulting Group www.andraka.com | Berkeley Design Technology Inc www.bdti.com |
| Cadence www.cadence.com | Celoxica www.celoxica.com | Improv Systems www.improvsys.com |
| Leopard Logic www.leopardlogic.com | The MathWorks www.mathworks.com | Motorola www.motorola.com |
| QuickSilver Technology www.qstech.com | Synopsys www.synopsys.com | Tensilica www.tensilica.com |
| Texas Instruments www.ti.com | ||
| Author Information |
Technical Editor Brian Dipert wonders how long it will be before it's possible to squeeze all of the electronics in a personal video recorder or a digital-cell-phone base station into a single hybrid FPGA—including the mixed-signal front- and back-end stuff. Seriously. You can reach Brian the fantasist at 1-916-454-5242, fax 1-916-454-5101, bdipert@edn.com, and www.bdipert.com. |
| References |
|
| Acknowledgments | ||
| Kudos to Jeff Bier from Berkeley Design Technology and to Ray Andraka from the Andraka Consulting Group for their editorial contributions. | ||
|
|















Technical Editor Brian Dipert wonders how long it will be before it's possible to squeeze all of the electronics in a personal video recorder or a digital-cell-phone base station into a single hybrid FPGA—including the mixed-signal front- and back-end stuff. Seriously. You can reach Brian the fantasist at 1-916-454-5242, fax 1-916-454-5101, 
