Subscribe to EDN

Is Algorithmic Synthesis the One True ESL Path? A Clue: No

September 18, 2008

Yesterday, I pointed out a couple of articles in the latest DAC eZine to my friend and colleague Grant Martin. We both agreed to take a crack at discussing the topics in these articles in our respective blogs. The articles are Algorithmic Synthesis: the “Killer App” for broad ESL deployment by Synfora’s Sunil Ashtaputre and ESL Synthesis Innovation: the importance of choosing the right technology by Mike Meredith of Forte Design Systems. Both of these articles describe the use of high-level or algorithmic synthesis tools and portray them as the one true path to ESL (electronic system level) design. T’ain’t so. As my friend Grant wrote: I am sure everyone knows the old saying “When all you have is a hammer, everything looks like a nail.” In this blog entry, I’m going to take a crack at making the argument that there are at least two paths for SOC designers wanting to travel the road to ESL.


The state of SOC design

How did we get where we are today with SOC design? Figure 1 shows my answer to this question regarding high-tech history. The custom-IC revolution started with gate arrays in the early 1980s. Prior to the introduction of gate arrays, the development of custom chips was restricted to a very few companies that had the money and resources to develop their own ICs. Vendors offering gate arrays hid much of the complexity and cost of physical IC design from their customers and allowed logic-design teams not versed in the arcane particulars of IC design to cast their designs in silicon instead of circuit boards.

 


 

 

Figure 1: As custom IC design evolved from gate arrays to multicore SOCs, verification as a component of overall development has grown to become larger than design in terms of cost and time.

The introduction of gate arrays produced an electric result. Many more companies could suddenly afford to develop custom chips using the gate-array route. System complexity jumped almost immediately. By 1990, with the introduction of logic synthesis, logic-design teams could tackle even more complex system designs by creating standard-cell ASICs using written descriptions in Verilog and VHDL instead of drawn schematics. System complexity jumped once again because HDL descriptions proved to be better than schematics for creating large designs. The cost to use this design methodology also jumped to accommodate a growing list of required EDA tools, but these tools allowed logic designers to create ASICs while staying relatively aloof from the nitty-gritty aspects of physical design.

By the SOC era (an SOC is nothing more than an ASIC with an on-chip processor, RAM, and ROM) and the advent of nanometer IC lithographies, the party started to wind down. Design complexities veered into many millions of logic gates. System designs became astronomically complex and gate-level verification became a real burden because there were just too many gates to simulate in a reasonable amount of time. The cost and time required for gate-level verification now dominate the project cycle.

The industry is now in the era of the multicore SOC and chip-level complexities have hit even higher highs. For many SOC design projects, verification approaches or even exceeds 70% of the development cost. Design is no longer the leading consideration in such projects. To me, that’s a very sad state of affairs.

Something has clearly gone wrong.

Usually, companies promising a solution to this growing problem invoke the mantra-like phrase “higher level of abstraction.” But like many mantras, no one is sure about the exact meaning of that phrase and many people remain confused as to why a higher abstraction level will help..


The starting point: algorithms

The place to start our analysis of that abstraction-level mantra is with the most fundamental part of system design: algorithm development. Systems are collections of algorithms, operating both independently and dependently. For example, a multimedia player uses one algorithm to decompress a media stream, another to decode the video into a series of images, and yet another algorithm to decode the audio into sound. These algortihms operate relatively autonomously with some synchronization required among them. Baseband radio chips similarly employ a very large variety of algorithms such as FFTs and inverse FFTs, rake filters, Viterbi and Turbo decoders, etc.

All electronic systems are based on the execution of one or more algorithms. Algorithm development is the first step in the system-development process because it creates the building blocks needed to construct the system. Algorithms are a higher level of design abstraction.

Take a look at Figure 2, which I’ve adapted from Thomas Bollaert’s chapter on Catapult C, Mentor Graphic’s an algorithmic-synthesis tool. This chapter appears in the book High-Level Synthesis, from Algorithm to Digital Circuit. The left side of Figure 2 illustrates the conventional path for getting from an algorithm to a piece of silicon. It’s important to understand this well-trodden path because it embodies that “lower level of abstraction” that’s causing all the troubles described above.

 


 

 

Figure 2: RTL-centric design, shown on the left, forces SOC engineers to manually implement algorithms at low abstraction levels. As system complexity increases, this approach becomes expensive and unwieldy. Algorithmic synthesis eliminates the low-level manual design and speeds the process of transforming an algorithm into a gate-level design.

The left side of Figure 2 illustrates a design flow that starts with algorithm development. These days, algorithm developers predominantly use C or C++ (sometimes MATLAB). They debug and finalize their code on fast, inexpensive PCs or, in some cases, an array of fast, inexpensive PCs. When the algorithm is proven, it will usually exist as a floating-point model, which is then converted into a fixed-point model to ease the eventual conversion into a gate-level design.

Then, designers take the C or C++ code and manually translate it into a hardware description language (HDL) such as Verilog or VHDL. The HDL description is the language of logic synthesis and this description drives the automated portion of the conventional flow that transforms the algorithm block design from C into gates. After the initial design is done, gate-level simulation can indicate areas to be optimized to meet performance, timing, or power goals. Much of this optimization is manual as well. Finally, when simulation and logic verification indicate that a design is correct and sufficiently verified, it’s submitted for logic synthesis and then physical design.

At this point, it’s critical to note that much of the manual design effort goes into converting the algorithmic C or C++ code into HDL. Manual conversion is slow, error-prone, and consequently it’s expensive. It’s so last millennium. Also you should note (and note very well) that the algorithm is already coded and proven in C/C++. It already runs successfully on a processor. However, merely running the C/C++ version of the algorithm on a general-purpose processor or DSP core may not meet one or more of the design’s power/price/performance criteria. Further optimization is often needed.

However, if the project criteria are all met by running the C/C++ algorithm code on a processor core, there is no need to develop hardware—the processor itself is good enough.

Unfortunately, a general-purpose processor often isn’t good enough. That’s why the conventional design approach shown on the left side of Figure 2 calls for manual conversion of the C or C++ algorithm into a hardware-description language. The right side of Figure 2 shows the algorithmic-synthesis alternative design approach. This is the approach advocated in the two DAC eZine articles cited in the first paragraph of this blog entry. A high-level synthesis tool accepts the algorithm as written to run on a processor and converts the algorithm into gates (a so-called “C-to-gates” conversion). Sounds great if the conversion is efficient and produces a good quality of results.

A decade ago, behavioral synthesis attempted to do the same thing but the approach quickly fell out of favor because the quality of results was poor. The resulting designs were very large and ran very slowly. The results were so bad that the term “behavioral synthesis” was banished from the EDA lexicon, probably forever. Today’s algorithmic-synthesis tools such as Mentor Graphics’ Catapult C, Synfora’s PICO Express, or Forte’s Cynthesizer reportedly do a better job on certain types of problems. (I don’t know myself because I haven’t used them.)

However, I do know that there is a middle path. It’s called processor customization. After all, if an algorithm is already written to run on a processor, it makes sense to try to keep it that way to minimize development effort. All you need is a more efficient processor—one that’s tailored to the algorithm. This path has been open since the first days of SOCs, when processors first appeared on ASICs. As long as the silicon is already going to be custom tailored for a specific application, the processor can be custom tailored as well.

There have been roadblocks to using custom-tailored processors on SOCs including the scarcity of experienced processor designers and the complexity of developing and maintaining an associated software-development tool chain for every custom processor. Historically, it hasn’t been easy to create an efficient processor design that performs well and it’s harder still to develop and support a software-development tool chain for a custom processor core. The costs for developing a custom processor core and the associated software-development tools simply haven’t been reasonable.

If your SOC design team were to approach a conventional processor vendor and ask for a custom processor that efficiently ran a specific algorithm optimally along with the requisite software-development tools, they’d likely get one of two answers. The first answer would simply be “No.” The second answer would be a cost quote that would place this alternative design approach out of reach.

That’s no longer true however. Automated EDA tools are now available from multiple vendors that allow logic designers and software developers—specifically not processor designers—to custom tailor processors for specific on-chip SOC tasks (algorithms). The customized processors run the targeted algorithms faster than can general-purpose processors and DSPs because new registers and instructions have been added so that multi-instruction sequences on non-customized processors become single-instruction sequences on the customized processor. Because they are guaranteed correct-by-construction, these customized processors require much less gate-level verification.

Figure 3 shows how the design flow changes when changing from a design flow that emphasizes low-level RTL design to a more processor-centric approach. Note the similarity to Figure 2. High-level synthesis and processor customization are both used in similar ways to speed algorithmic execution on an SOC.

 


 

 

Figure 3: A processor-centric approach also eliminates the low-level manual RTL design and speeds the process of getting to a gate-level design.

Automated tools that directly produce a custom processor from an algorithm’s C/C++ source code can improve algorithmic performance on the order of 3-10x. This approach to processor customization can take less than a day because an existing processor is merely being customized, not built from the ground up. A 3-10x boost in performance is good enough to satisfy many project needs and the algorithmic block is then ready to be incorporated into the overall SOC design.

Manual processor customization performed by developers trained to use the processor-customization tools (although still not experienced processor designers) can produce algorithmic execution speeds that are 10-100x faster and often match the speed of direct, gate-level RTL conversions. After all, the only real difference between a processor executing an algorithm and a hardwired datapath/state-machine execution engine is that the processor’s state machine is controlled by firmware. Everything else can be the same.

I’m a big advocate of the toolbox approach to engineering. A good design engineer has a variety of tools in the box and can select the appropriate tool for the task at hand. I am not an advocate of the “one-size-fits-all,” “rabbit-out-of-the-hat” approach to tool selection. Hence this blog entry.

Posted by Steve Leibson on September 18, 2008 | Comments (4)

October 7, 2008
In response to: Is Algorithmic Synthesis the One True ESL Path? A Clue: No
Sandeep Shukla commented:

I find this article very well written and interesting. However, tensilica being the company for processor customization, I would like to see some data points that shows that processor customization gives the right power/performance/cost trade-offs compared to say creating custom IP for co-processor synthesis to get the right performance within reasonable cost and power. Could you please point to us some data points?


September 20, 2008
In response to: Is Algorithmic Synthesis the One True ESL Path? A Clue: No
Grant Martin commented:

Not meaning to pile on here, as I am a colleague of Steve's, but I think those in the ESL field who provide virtual prototype models and tools tend to think their offerings are the true "killer apps" of ESL. I think it would be better, more accurate, and more modest, to say that High Level Synthesis and Virtual Prototypes/Platforms are the two current ESL application areas seeing some current growth and user takeup, and seem to be areas that are fulfilling some of the promises of ESL. However, it is a long way to go from growth in HLS to talking about repeating what happened at the RTL level, and needs a lot of evidence beyond what we are currently seeing.


September 20, 2008
In response to: Is Algorithmic Synthesis the One True ESL Path? A Clue: No
Steve Leibson commented:

Sunil, what kind of technical argument is "But the killer-app for ESL is synthesis of the application-specific hardware"? The only difference between high-level synthesis and customized processors with expanded execution pipelines is whether or not the state machine is firmware controlled. How doe that difference consign configurable processors to some sort of design ghetto (you wrote "But the killer-app for ESL is synthesis of the application-specific hardware.")? Your logic completely escapes me.


September 20, 2008
In response to: Is Algorithmic Synthesis the One True ESL Path? A Clue: No
Sunil Ashtaputre commented:

Designers are building application-specific custom hardware, they have been doing it for many years, and they will continue do it for many years into the future. Configurable processors do have a place for some types of designs. But the killer-app for ESL is synthesis of the application-specific hardware. RTL Synthesis became a $400M market, and enabled markets like Power Analysis, Physical Synthesis, RTL Verification, RTL Linting, Equivalence Checking, etc worth > $1B. HLS is on the verge of repeating this at the next higher level of abstraction.

POST A COMMENT
Display Name
captcha

Before submitting this form, please type the characters displayed above. Note the letters are case sensitive:

Advertisement
Advertisement
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2011 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows