Subscribe to EDN
RSS
Reprints/License
Print
Email

FPGAs "DiSP"lay their processing prowess

Does the performance-power-price product of your software-centric approach no longer compute? Do you need a nimbler platform than a hard-wired ASIC can provide? Programmable logic may be your answer, but carefully calculate the trade-offs to correctly solve your problem.

By Brian Dipert, Technical Editor -- EDN, October 3, 2002

AT A GLANCE
  • FPGAs' sweet spot straddles software's flexibility and hardware's thriftiness, low power, and performance.

  • Latest generation programmable-logic architectures embed arithmetic blocks for increased acceleration.

  • Silicon strengths strapped by software shortcomings will leave you unsatisfied.

  • Custom processors ease the software-to-hardware transition.

Sidebars:
Consult with an expert
Evaluating performance: FPGAs versus DSPs
Theme and variation

Mention "DSP" in conversation around the water cooler, and what vision will pop into other folks' heads? Probably a picture of a piece of silicon from a company such as Analog Devices, Motorola, or Texas Instruments. This image isn't necessarily wrong, mind you, but it's a bit like (as the old saying goes) "putting the cart before the horse." First and foremost, DSP stands for digital-signal processing; that is, converting analog signals to the digital domain, arithmetically transforming them in some way and then translating them back to the analog domain for human sensory consumption.

Digital-signal processors from the aforementioned companies and many others are only one vehicle for implementing digital-signal-processing functions. The earliest DSP chips, after all, were little more than otherwise general-purpose CPUs with Harvard architectures (separate instruction and data buses and separate caches), befitting their algorithms' data-centric nature. As general-purpose CPUs have grown speedier and particularly as they've added onboard arithmetic coprocessors and signal-processing-optimized instruction-set extensions, they're increasingly taking over the calculation burden that might have formerly required a separate compute engine. (This article uses "digital-signal processing" to refer to the function and "DSP" to refer to the processor.)

On the other end of the hardware-versus-software-implementation spectrum are ASICs, housing hard-wired arithmetic logic blocks and state machines. Inflexible? Yes. Do you need to design the hardware yourself? Yes, unless you license predesigned IP (intellectual property), which you must still stitch together with the remainder of your chips' circuits. But fast, low-power, and inexpensive (the cost based on the amount of silicon consumed to construct the function)? Yes. The ASIC-based approach (note, again, we're talking about hardwired functions, not a processor core you've integrated into your ASIC) is particularly attractive when your end system will sell in high enough volumes to justify the NRE (nonrecurring-engineering) costs, when time-to-market isn't critical, and when standardization and design experience maximize your first-silicon-functional confidence and preclude the need for post-sale upgrades.

In attempting to simultaneously stay ahead of general-purpose CPUs and prevent you from jumping ship to hard-wired ASICs, the DSP vendors have evolved their high-end architectures into multicore VLIW (very-long-instruction-word) "engines" and have incorporated their own hardware-acceleration capabilities in the form of dedicated Viterbi decoders, matrix multipliers, and the like. Although signal-processing functions contain a great deal of parallelism, the incremental performance gain with each added engine is less than 100%, and multiprocessors are challenging to program. Hardware acceleration also tends to make the resultant DSP more application-specific and, therefore, more expensive than a general-purpose alternative.

Silicon foundations

Digital-signal processing, thanks to explosive growth in wired and wireless networks and in multimedia, represents one of the hottest areas in electronics. So it's no surprise that dozens if not hundreds of stand-alone-chip and embedded-core vendors are chasing after the business, representing both the software- and the hardware-centric implementation extremes. But at least one in-between option bears your consideration (see sidebar "Theme and variation"). Like software, a programmable-logic device is almost infinitely customizable, and, as with a processor, the silicon physical-design work is already done for you. FPGAs aren't quite as low-power, fast, or dense as ASICs, but they're superior to processors in those regards (see sidebar "Evaluating performance: FPGAs versus DSPs"). You can buy FPGAs, unlike ASICs, in small quantities with no upfront NRE charges, and you need not wait for months' worth of fab, packaging, and test delays after your design's done to obtain a working chip.

FPGA manufacturers have for years now been trumpeting their chips' ability to implement digital-signal processing, even before the emergence of low-latency carry-chain-routing lines that sped addition and subtraction operations spanning multiple logic blocks. The next significant improvement in FPGA arithmetic capability appeared with Atmel's AT40K architecture. An embedded AND gate within each logic block, working in concert with block-to-block diagonal routing lines, boosted performance when the chips were crunching array-multiplication calculations (Figure 1). Ironically, however, AT40K provides no carry chains.

Atmel's FPGAs are partially reprogrammable. (Lattice's ORCA line and Xilinx's Virtex devices are also reprogrammable, but their development tools do not neatly expose the silicon potential.) Theoretically, this capability means that you could dynamically time-swap various logic engines into a common silicon fabric. Pragmatically, the more likely scenario involves your ability to, for example, optimize an imaging filter's co-efficients in a digital-still-camera or videocamera application as ambient-light conditions change or tweak processing parameters in response to varying communication-channel SNRs.

The now-dominant LUT (look-up-table)-plus-register-logic-block combination, in combination with fast carry chains, is relatively efficient when implementing addition and subtraction operations. It's not, however, optimal in cost, performance, and power for multiplication and division functions. As a result, Altera (with Stratix), QuickLogic (with QuickDSP, now renamed Eclipse Plus) and Xilinx (with Virtex-II and Virtex-II Pro) have all taken a page from the ASIC book of tricks and embedded dedicated multiplier-function blocks on-chip (Figure 2). Altera and QuickLogic move even further along the integration path, providing full-blown MACs (multiply-accumulators, see Table 1 for specifications and Reference 1 for prices). Altera calls its version the DSP block (Table 2); QuickLogic—the first of the three to take the dedicated-arithmetic-unit-integration plunge—refers to its configurable variant as an embedded computational unit (Table 3).

In examining Altera's and Xilinx's arithmetic structures, you might question why the companies chose nonstandard 18-bit data inputs versus the more common 16-, 24-, and 32-bit lengths. One answer is that they wanted to match the bus widths of the FPGA's parity-inclusive embedded-RAM blocks. But a more general answer also exists, and it leads to a subtle but powerful FPGA strength. A signal-processing function rarely demands exactly 16-, 24-, or 32-bit precision. If it requires less, your CPU or DSP is wasting pins, external memory, and bus bandwidth. If more, you have to implement performance-sapping multiple-pass algorithms.

With an FPGA, particularly if you use general-purpose LUT and register structures, you can implement exactly the data precision you need to do the job—at least from a logic standpoint. Memory is the only remaining wrinkle; both the large embedded-memory blocks and the external memories come in predefined bus widths. FPGAs that can alternatively employ LUTs not only for logic functions, but also as small RAM arrays, such as Xilinx's various product families and Lattice's ORCA chips, are helpful here, because they give you density and bus-width flexibility to supplement or replace dedicated RAM arrays. LUTs have another valuable function in calculation-intensive signal processing: They are a more efficient alternative to registers for storing intermediary values.

Design contortions

Alas, the semiconductor graveyard is littered with the bones of great FPGA hardware ideas that went that by the wayside because of inadequate design-software support. How can you ensure that your signal-processing algorithms take full advantage of the power, performance, and efficiency of the embedded circuitry within these advanced chips? The answer depends at least to some extent on your design background.

If you're used to creating hard-wired circuits in ASICs, you'll be treading the easier of the two paths to FPGA nirvana. In a perfect world, the combination of a third-party synthesis compiler and the FPGA-vendor-provided place-and-route software should be able to automatically infer from your HDL code where to use specialized multipliers and MACs, much as they should do with embedded memory arrays and other on-chip function blocks (references 2 and 3). The vendors' documentation often makes recommendations on coding styles, which help guide the design software to the ideal end result (Reference 4).

In the real world, you still sometimes need to explicitly instantiate references to these and other device primitives in your HDL, a task that leaves your code less architecture- and device-generic, thereby contradicting a key motivation for using an HDL in the first place. The vendors have all developed pushbutton utilities that greatly simplify your creation of higher level functions, such as filters and transforms. Enter a few parameters, click "OK," and out comes a netlist "black box," which your HDL code references. At minimum, you have the option of creating a fully serial (to minimize pin count and logic usage) or fully parallel (to maximize performance) circuit. Depending on the size of your design and device and on your performance and power budgets, you might want to choose an in-between approach; in this case, make sure that the vendor's core generator tools support such flexibility.

Most of you, though, will probably be coming to FPGAs from a software background, and, ideally, you'd like to port your legacy code to hardware as straightforwardly as possible. In this case, the news is less optimistic. In fact, at least one in-the-know FPGA power user suggests that, unless a DSP has become a completely unpalatable option from a power, performance, or cost standpoint, you shouldn't bother taking even one step down the FPGA path (see sidebar "Consult with an expert"). Ironically, in developing your software in the first place, you've taken what was in all likelihood a highly parallelizable function and gone through a lot of work to convert it to a time-sequential serial algorithm in C or assembly code. Now, to port the algorithm to an FPGA, you need to reparallelize it and, sometimes, convert it from floating-point back to sufficiently-precise fixed-point arithmetic.

Companies such as Adelante Technologies (formerly, Frontier Design, with A|RT), Celoxica (with Handel-C), and Synopsys (the original developer of SystemC) all claim the ability to generate hardware designs from C code. And, to a degree, they all deliver on their claims. None of them, though, fully supports generic ANSI C code. (Pointers are particularly problematic.) Their languages are tailored to the task of hardware creation. So, although they might be acceptable for brand-new designs, they will be less help with your years' worth of existing software.

The designs created by C-to-hardware tools are less efficient than those you develop from the beginning in an HDL (an analogy to programming in C versus assembly language), but, thanks to the near-guaranteed performance boost in software-to-hardware conversion, lost efficiency may be a small concern. If you've developed your software algorithms in The MathWorks' Matlab, you might be in better luck. Both Altera and Xilinx offer tool sets that interface to Matlab and Simulink and output vendor-optimized hardware code and IP blocks. Xilinx also supports Cadence's SPW (Signal Processing Worksystem, Figure 3).

Intermediate approaches

Although we're talking about digital- signal processing, you shouldn't assume that you've got at your disposal a "binary" set of implementation options—either fully software or fully hardware. The spectrum of choices is, in reality, far more "analog," reflecting a possible partitioning of tasks among both hardware for optimum speed, power consumption, or per-unit cost and software for ease of development and legacy compatibility (Figure 4). Both Atmel and Xilinx, for example, have, with their respective FPSLIC and Virtex-II Pro product lines, combined arithmetic-tuned programmable logic and "hard" CPU cores on a single device. If you develop your signal-processing algorithm in Matlab and Simulink, you can iteratively direct portions of it to C code and the remainder to FPGA hardware. Otherwise, don't waste too much time on tedious hardware-versus-software partitioning and repartitioning; focus your efforts on obvious software bottlenecks that benefit from FPGA acceleration.

Altera's ARM-based Excalibur chips and QuickLogic's QuickMIPS line do not include the hardware MACs in the vendors' respective Stratix and Eclipse Plus products. However, years' worth of successful design examples suggest that, although MACs might speed signal processing, they aren't an absolute requirement. Embedded memory blocks, for example, can also find use as multipliers. For the same reason, don't forget about Triscend's A7 and E5 chips. If you do need dedicated arithmetic circuits to hit your hardware-design targets, a "soft" CPU core might conversely provide you with sufficient software capabilities. Altera's Nios fits inside Stratix, and Xilinx's MicroBlaze works with Virtex-II and Virtex-II Pro. Supplementing Virtex-II's "hard" PowerPC cores with one or multiple "soft" MicroBlaze cores results in some interesting single-chip, multiprocessor-architecture possibilities.

Look, for example, at Altera's recently announced DSP-development kit, an expanded version of last year's DSP Builder tool. Working hand in hand with Matlab and Simulink, it enables you to develop custom instruction-set and hardware-accelerated variants of the Nios processor that tap into the MACs and other resources in the programmable-logic fabric. Altera claims to have more than 60 DSP cores in its IP library, spanning specific functions, such as encryption, error correction, image processing, and modulation, as well as general-purpose functions, such as filters and transforms.




OTHER COMPANIES MENTIONED IN THIS ARTICLE
3DSP
www.3dsp.com
Actel
www.actel.com
Adelante Technologies
www.adelantetechnologies.com
Analog Devices
www.analog.com
Andraka Consulting Group
www.andraka.com
Berkeley Design Technology Inc
www.bdti.com
Cadence
www.cadence.com
Celoxica
www.celoxica.com
Improv Systems
www.improvsys.com
Leopard Logic
www.leopardlogic.com
The MathWorks
www.mathworks.com
Motorola
www.motorola.com
QuickSilver Technology
www.qstech.com
Synopsys
www.synopsys.com
Tensilica
www.tensilica.com
Texas Instruments
www.ti.com
   





References
  1. Dipert, Brian, "EDN's third annual programmable-logic directory," EDN, Sept 5, 2002, pg 44.

  2. Dipert, Brian, "Lies, damn lies, and benchmarks: The race for the truth is on," EDN, May 27, 1999, pg 54.

  3. Dipert, Brian, " Synthesis shoot-out at the EDN corral," EDN, Sept 11, 1998, pg 95.

  4. Dipert, Brian, "Getting a handle on HDLs," EDN, May 7, 1998, pg 71.


Acknowledgments
Kudos to Jeff Bier from Berkeley Design Technology and to Ray Andraka from the Andraka Consulting Group for their editorial contributions.


Acknowledgments

Kudos to Jeff Bier from Berkeley Design Technology and to Ray Andraka from the Andraka Consulting Group for their editorial contributions.

Author Information
Technical Editor Brian Dipert wonders how long it will be before it's possible to squeeze all of the electronics in a personal video recorder or a digital-cell-phone base station into a single hybrid FPGA—including the mixed-signal front- and back-end stuff. Seriously. You can reach Brian the fantasist at 1-916-454-5242, fax 1-916-454-5101, bdipert@edn.com, and www.bdipert.com.

For more information...

When you contact any of the following manufacturers directly, please let them know you read about their products in EDN.

Altera
1-408-544-7000
www.altera.com

Atmel
1-408-441-0311
www.atmel.com

QuickLogic
1-408-990-4000
www.quicklogic.com

Triscend
1-650-968-8668
www.triscend.com

Xilinx
1-408-559-7778
www.xilinx.com

 

RSS
Reprints/License
Print
Email
Talkback
Canon Resource Center

Featured Company


Most Recent Resources

Advertisement
Related Content

No related content found.

  • 0 rated items found.
Advertisement

KNOWLEDGE CENTER

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Engineering Careers
Jobs sponsored by
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows