Subscribe to EDN

BDTI unveils FPGA C-synthesis certification: Can C beat RTL?

January 18, 2010

With the appearance of higher speeds and more DSP macrocells in low-cost FPGAs, more and more design teams are seeing the configurable chips not as glue, but as a way to accelerate the inner loops of numerical algorithms, either in conjunction with or in place of the traditional DSP chip. It’s well understood that encoding critical kernels in an FPGA can increase performance by more than an order of magnitude compare to even the fastest DSP chip, often reducing energy consumption as well.

But there’s a problem. You code for a DSP chip in C, and you implement using a conventional software tool chain with familiar software debug tools. You configure an FPGA starting in Verilog or VHDL—superficially similar to C but in practice profoundly different—and you implement using a hardware design flow. The two approaches require very distinct skill sets.

That’s where so-called Electronic System Level (ESL) tools come in. An ESL synthesis tool lets you write your code in C, synthesize RTL from the C automatically, and then feed the RTL into your FPGA flow. Se every competent DSP programmer becomes an FPGA developer. In a manner of speaking. In reality, such tools meet with extreme skepticism from both hardware and software engineers, suspected of poor quality of results, unreliability, and other vices.

But is that fair? Berkeley Design Technology Inc. (BDTI), long a dominant force in DSP benchmarking, invested two years of planning and one year of implementation to find out. Today the company released the first results of its certification program for high-level synthesis tools. The first evaluation covers two such tools: AutoESL’s AutoPilot and Synfora’s PICO. Further evaluations are planned.

The bottom line finding is simple: both tools produced results in reasonable time that were far higher in performance than software on a DSP chip, and comparable in density and performance to hand-coded RTL. But beneath that level, there is a wealth of information in the fine print.

First, the methodology—always a compromise between realism and practicality. BDTI’s initial benchmark is a fully functional Optical Flow design. The design comprises a three-ring binder and a DVD, which in turn contain a text description of the algorithm, the algorithm in about 600 lines of ANSI C, and a Xilinx reference design—the Video Starter Kit—which includes a board, FPGA, and, critically, IP for such items as video and DRAM interfaces, buffers, and a sophisticated programmable buffer controller.

BDTI turns the kit over to the ESL vendor, which tunes the C code for their tool and produces a design. BDTI engineers then independently repeat the process. The goal on the Optical Flow core is to achieve maximum throughput using all the resources available in the Spartan IIIA FPGA.

Unsurprisingly, both ESL vendors produced designs with about 40 times the throughput of the best BDTI engineers could do on a TI DM6437 DSP chip. More interestingly, the amount of work required to do the FPGA design, from C to programming file, was similar to the work required to program the DSP, according to BDTI president Jeff Bier. But there were significant differences in the two tasks. Optimizing the C code for one of the ESL tools caused the code to balloon from the original 559 lines supplied by BDTI to 1604 lines of C. Actually, Bier says, the work involved in the optimization was somewhat less than was required to optimize the code for the DSP chip. "It turned out that the DSP had a serious memory bottleneck that we had to code around," he explains. The synthesis tool then generated over 38,000 lines of Verilog from the optimized C.

Here’s where the major difference hit. BDTI engineers, experienced DSP programmers, could handle the entire flow for the TI chip. But they were pretty much stumped by a huge pile of Verilog and a stack of Xilinx tools. They ended up calling in an RTL expert to shepherd the RTL through the Xilinx tool chain, debug it, and produce the configured FPGA.

As a second test, BDTI wanted to compare the C-level synthesis against a hand-crafted RTL design. But 38K lines of code were too daunting—producing a decent design would have required multiple engineer-years from an experienced RTL team. So BDTI opted to create a second reference design, a DQPSK receiver core. This design was similar in size to the Optical Flow at the C level, with 514 lines of C. But optimization only expanded the code to 635 lines, and synthesis produced a manageable 11,000 lines of Verilog. (The results from both AutoPilot and PICO were similar in size and performance, though quite different in structure, Bier says.)

As a comparison, a Xilinx engineer hand-coded this design in RTL, working from the text description and the C, employing good coding practices but not pulling out the stops for optimization. "The point was to compare good efforts on both tools, not to compare against what an FPGA vendor team could do by hand," Bier explains. Here, the results are more surprising: meeting the 18.75 Msample/sec input rate requirement, the two ESL tools and the RTL hand-design all produced about the same size core: about 6 percent of the FPGA capacity.

But did the ESL users require hardware design skills? No, and yes. Xilinx senior marketing manager Tom Hill says the skills necessary to optimize the C code for ESL synthesis are closer to software optimization skills for a DSP target than they are to hardware design skills. So no: in the beginning of the flow a DSP software team will do just fine. But the ESL tools produce RTL, not a Xilinx programming file. Bier says that familiarity with RTL, the Xilinx tools, and FPGA hardware is still necessary to complete the design. Debug is especially problematic, he observed, because the coupling between the ESL and implementation tools is less than ideal.

There’s room in the BDTI report for just about everyone to say "I told you so." But clearly it is no longer prudent for design teams working with computationally-oriented cores to ignore ESL synthesis tools. And the analogy to the days when RTL synthesis was just beginning to displace schematic capture and Karnaugh maps, or for that matter the time when embedded software started to be done in C instead of assembler, is irresistible. Stand by for change.

Posted by Ron Wilson on January 18, 2010 | Comments (6)

April 1, 2010
In response to: BDTI unveils FPGA C-synthesis certification: Can C beat RTL?
Oxygen Chu commented:

i'm a Catapult user (not a formal user, just got interest) in China for months. actually speaking the HLS tools are not mature enough, since it started only several years ago. just think about RTL tools, which have over decades history! i use it for FPGA code generation, and i find it really can have a rapid speed than hand-code RTL. and the most important is that it can be used for rapid algorithm prototyping on FPGA. however, the quality of auto-generated RTL is really somewhat awesome: neither friendly-for-reading, nor efficiently-for-implementing. example: no matter how hard i tried to optimize the C/C++ code following the Coding-Style-Guide, the generated fully-pipelined FIR filter just can't have systolic micro-structure which can be easily hand-coded by RTL or by instantiating DSP48 cascaded. The result is a big chunk of LUT/FF is wasted to build cascading logic, which could be fully absorbed into DSP48 (so cause no LUT/FF at all). the HLS tools are not mature at present, but it really open a way to gain more productivity, just like many years ago when engineers doing hardware from schematic-drawing to code-writing. at last, if any guy who interested in HLS, i'd like to share more with him/her. oxygen.xilinx@gmail.com


March 26, 2010
In response to: BDTI unveils FPGA C-synthesis certification: Can C beat RTL?
RSNikhil commented:

Re. "synthesis produced a manageable 11,000 lines of Verilog...As a comparison, a Xilinx engineer hand-coded this design in RTL" How many lines of Verilog was the hand-coded RTL? I.e., is the size of synthesis-generated Verilog a good predictor of the size it would take if hand-coded?


March 26, 2010
In response to: BDTI unveils FPGA C-synthesis certification: Can C beat RTL?
RSNikhil commented:

I have a question re. "They ended up calling in an RTL expert to shepherd the RTL through the Xilinx tool chain, debug it, and produce the configured FPGA." What were they debugging? Presumably not the design itself, since with high-level synthesis tools you're supposed to debug the source and not the generated RTL?


February 2, 2010
In response to: BDTI unveils FPGA C-synthesis certification: Can C beat RTL?
BobsUrUncle commented:

BullS*t benchmark. They handed a folder to the vendor and let them craft an optimized design. Wonder how many people evaluate tools like that? The biggest problem with these tools is how to convert standard C into something the tool can recognize as a high level hardware description. And not it's not like software optimization at all. It's just another RTL language. You need a real Hardware Engineer to drive these tools -- not some SW guy who read a verilog book. These tools are really adjuncts for the HW engineer to convert an algorithm into functional HW. Anything else is pure B.S.


January 21, 2010
In response to: BDTI unveils FPGA C-synthesis certification: Can C beat RTL?
Steve Cox commented:

Great post, Ron. And great initiative, BDTI. I look forward to see where this all leads. Very useful for the industry, in my opinion, and this real data can really help ESL get past the conceptual debates and into the next level - wider proliferation. Still, I'm confused why this article is written focusing on FPGAs. Offload from merchant processors is as common on-chip (e.g. look at any SmartPhone chip or Network Processor) as it is "on-board" (i.e. in the FPGA context that you describe). Of course, the technique is useful for both. Additionally, however, your readers may also be interested to realize that there are still other alternatives to C synthesis that provide the RTL-like efficiency, yet still allow direct C-source code debug - without the need for the programmers to think in terms of how the algorithm mapped to HDL. Such opportunities exist in the configurable processor space - especially when using an ASIP design tool like those available from Target Compiler Technologies or Coware. If overcoming the debug problem (and the RTL skill requirement mentioned by Jeff) are important issues for you, you should check into these alternatives.


January 19, 2010
In response to: BDTI unveils FPGA C-synthesis certification: Can C beat RTL?
cms commented:

Great post, very informative. So, as Bier says, the main problem of the HLS tools now is absence of debug capability?

POST A COMMENT
Display Name
captcha

Before submitting this form, please type the characters displayed above. Note the letters are case sensitive:

Advertisement
Advertisement
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows