Mentor’s Wally Rhines on EDA: Everything is Broken
Apr 2 2008 4:41PM | Permalink | Email this | Comments (12) |
Blog This! using: Blogger.com | LiveJournal |
Digg This | Slashdot This | add to Del.icio.us
I was listening to Mentor Graphics’ CEO Wally Rhines’ keynote yesterday at the Globalpress event in San Francisco. He was discussing the state of IC EDA and I could not help but think of Bob Dylan’s new song Everything is Broken:
Broken lines, broken strings,
Broken threads, broken springs,
Broken idols, broken heads,
People sleeping in broken beds.
Ain't no use jiving
Ain't no use joking
Everything is broken.
Rhines always gives a good speech. Good because he infuses what he says with history and facts, not just opinions, slogans, and sales pitches. When Rhines says he’s a student of EDA history, you can believe him. When he says that the EDA tool that breaks most frequently is place and route, you can believe him. And that’s exactly what he said in his keynote speech, which focused on what’s broken in EDA as we move into the 45nm era.
“Engineers love innovation,” said Rhines. “They just don’t like to change their tool flows or learn new tools.” Consequently, they don’t change their tools very often. “Change only comes when the tools don’t work at all. That’s where we are with 45nm SOC design today.”
What breaks most often?” asked Rhines. “Place and route,” he answered. On average, says Rhines, place-and-route EDA tools break every two IC process generations. The tools simply stop working for the most advanced designs. He then projected the following graph showing the history of process generations from 1.5 microns to 45nm and walked us through the times when place-and-route tools broke during the three-decade era of 3rd-party EDA tools.

The first commercial place-and-route tool, said Rhines, appeared in 1984 just after the start of the 1-micron generation of ICs. These were the early days of gate arrays, when ASICs were just starting to become popular with system designers. This time also marks the start of the era of fabless semiconductor vendors. That same year, IC vendors rolled out the 0.75-micron generation of fabrication processes and the available place-and-route tools started to falter because of complexity. This first break provided an opening for Tangent’s router, which solved the routing complexity for “advanced” 0.75-micron chips. I remember Tangent Corp and Tangate, the routing tool that really made gate-array design possible for mere mortals. It was a very big deal back then. So big that Cadence acquired Tangent in 1989. By 1997, Cadence was selling $300 million worth of place-and-route IC tools per year for standard-cell IC design. A big deal for sure.
However, by 1997, we’d graduated from standard-cell ASICs to 0.25-micron SOCs. What’s the difference? SOCs contain microprocessors, which are really tough to route—especially around the critical general-purpose register file with its multiple read and write ports. The wheels fell off the gate-array-centric routers at that point and place-and-route tools were again broken.
About that same time, Gerry Hsu and several other Cadence employees left to form ArcSys, which developed a superior router that could successfully route the new 0.35/0.25-micron chips. ArcSys became Avanti! and was soon selling $300 million worth of place-and-route tools per year for the 0.18-micron generation of ASICs. Cadence then sued Avanti! for trade-secret theft and Avanti! eventually ended up as part of Synopsys, but that’s a different story. (A very interesting one, I might add.)
The next place-and-route crisis, at the 0.13-micron node, was caused by widespread failure of the existing tool flows to achieve timing closure. Enter Magma with a timing-driven place-and-route tool. By the time the 90nm IC node became established, Magma had become the place-and-route tool vendor of choice.
Now we’re at the 65nm and 45nm nodes and the wheels have again fallen off of the place-and-route tools. This time, says Rhines, process variability across a chip is the culprit. In addition, we’ve squeezed a lot of available margin out of designs. We push timing hard. Our designs are truly huge. We’ve reduced operating voltages to the point that 100mV or even 50mV of supply ripple cannot be tolerated. Meandering interconnect lengths are so long and vary so much from wire to wire and from layer to layer that critical paths abound.
We must now place huge constraints on chip designers. Where TSMC might have a design team sign off on four “corners” (temperature/voltage corner cases) for a 130nm SOC design and 10 corners for a 90nm design, it now requires signoff on as many as 21 corners for a 65nm design. That requirement represents a lot of design constraints. As a result, place-and-route tools are again “broken” because they are not designed for and cannot account for all of this variability. Existing tool vendors cannot fix their tools because it’s not possible through simple extension and the vendors must continue to support their existing customer base. That’s the reason that these broken tools are always replaced with new ones from startup EDA vendors (there’s a strongly implied invocation of Clayton Christensen’s Innovator’s Dilemma here).
Which leads us to the other tools in the design chain that are also broken at 65nm and 45nm. Functional-verification tools are “broken” because verification now consumes 50-70% of the SOC design effort. That’s huge and getting bigger. At this rate, said Rhines, functional-verification jobs would soon be available for “every man, woman, and child in India.” Although it’s nice to contemplate full employment for any country, this particular future is not economically viable and something will have to give.
What gives, says Rhines, is low-level ASIC design in the form of Verilog and VHDL. The abstraction level supported by these languages is too detailed for the design of chips with hundreds of millions of transistors. Rhines predicts that the era of system-level design using languages like System Verilog or VHDL with PSL (“property specification language” for assertion-based design) are the future. He also asserts that SOC designers will need to get a lot more serious about embracing pre-verified IP blocks (including embedded processors) for their designs.
Another factor breaking the tools is the need for better design of low-power systems. If the SOC design team fails to successfully design the chip because they can’t route it, achieve timing closure, or verify the design, then their company can’t ship the chip. If the design team succeeds in designing the chip but it draws too much power, their company will not be able to sell the chip. Both of these paths lead to product failure.
For far too long, SOC designers have been relying on RTL twiddling, circuit-design advances, and process tricks to alleviate the ills of unoptimized system architectures. “If you’re really after a low-power design,” said Rhines, “you must create a low-power design at the beginning. You have a big impact at the architectural level.” As the graph below shows, designers have far more control over system power at the architectural-design level, but they use precious few tools to explore alternative architectures to find power-optimized configurations. “Most of the EDA industry’s development has been far below the system level,” said Rhines. Unlike gate-level design, architectural design is still more craft than engineering.

The latest break at 45nm “will force people to use real system-level design tools,” said Rhines. The following graphic captures this situation nicely. Using RTL tools alone to explore a design space, the design team can only explore a small part of the space because RTL simulation takes a long time. There just aren’t enough days in a project to cast the exploration net widely. What’s needed is faster system simulation, which you can only get if you design at the system level rather than the gate level. That’s a more precise way of saying that you need to raise the level of design abstraction.

Rhines foresees much wider use of SystemC and other system-modeling and -simulation languages to develop complex SOC designs. “The algorithms are already written in C,” he said. “Simulate them in C.” I fully concur. Our opinions diverge when it comes to describing what will happen to that C code once the system is designed. Rhines believes the C code will be fed to a compiler and turned into hardware. No doubt he believes that because he’s the CEO of Mentor, which sells a tool called Catapult C that transforms C code into RTL hardware descriptions. On the other hand, I believe that much of that C code will be run on embedded processors tailored to the task at hand. After all, C is first and foremost a language designed to run on processors. No doubt my beliefs spring from my day job, working for a vendor that offers such processors. Rhines and I are both firm in our divergent convictions, so this seems like a very good place to end this blog entry.
(Note to other CEOs: If you want to see nice long blog entries about your keynote speeches, take a page out of Wally Rhines’ playbook. That guy can speak.)
Related entries in: ASICs | EDA | HDL | Simulation | SOC | System-level Design Language |
Reader Comments
at 4/2/2008 5:54:27 PM, jayk said:
Steve:
thanks for the review. Wally is indeed a very entertaining and illuminating speaker--in no small part owing to his graphics. The other no small part, of course, is that he's a very engaging lecturn presence.
I have another question, in a parallel vein, to put to you in your Tensilica role: Intel and MS are sponsoring multi-core (MC for now) software development efforts at UCB. Generally, I think, their feeling is that multi-core sw development is also broken. Do you concur?
Also, UCB's published stuff seems to indicate full MC exploitation may well require application-specific core/sw combinations, and perhaps application-specific languages too. What's your thought on all this?
/jay
at 4/2/2008 7:27:27 PM, Steve Leibson said:
Jayk, thanks for the comments. You''re right about Wally Rhines'' graphics. I love them. I suspect he''s got a really great graphics team on hand. I can''t imagine that a CEO has all that time to create such nice graphics.
As for the new multicore software efforts by Intel and Microsoft, I believe they recognize that a lot of potential for SMP (symmetric multiprocessing) is not being realized because of current software coding methods. Intel wants to further use of its multicore SMP Pentiums and Microsoft wants to write more code. However, there are other ways to exploit the potential of multiple processor cores right now that require no new tools and no new programming methods. Tensilica''s Grant Martin and I have written on this topic recently. Try the following links: www.edn.com/blog/980000298/post/50023005.html, www.edn.com/blog/980000298/post/1610023761.html, www.scdsource.com/article.php?id=87, and www.eetimes.com/news/design/showArticle.jhtml?articleID=206900262
at 4/2/2008 10:24:07 PM, Harry the ASIC guy said:
I think that we are at a crossroads at 45nm for a few reasons that Wally describes. I''d summarize it like this:
Can you design it - For a 25M gate chip, using RTL design methods is like doing schematic entry on a 1M gate chip. Not only will it take too long to design, but the number of design errors introduced will take forever to debug. IP reuse, the migration of hardware into software running on embedded core (like Tensilica''s), and higher level''s of abstraction will all help. For the time being, I have not seen good C++ level design solutions except for algorithmic functions (data flow), but perhaps they are on the way.
Can you verify it - The challenge here is fairly obvious and has been described often since verification grows faster than O(n) with gate count. HLV methods help, but they mostly shift the burden to class library development. I think pre-verified IP and higher levels of abstraction will help, but I also think virtual and hardware prototyping will have a greater place.
Can you build it - 45nm and below is presenting numerous challenges. Process variability, yield, leakage power, IR drop during test, test pattern length, you name it. Beyond 32nm will also take some innovation. Intel has introduced Hi-K dielectrics, so look for that to go mainstream and buy us a few nodes.
Harry the ASIC guy
theasicguy.com
at 4/3/2008 12:53:04 AM, Steve Leibson said:
Harry, thanks for the comments. One note: John East, president and CEO of Actel, discussed Hi-K dielectrics in his keynote speech at Globalpress. At best, Hi-K dielectrics help postpone problems with static leakage caused by gate-oxide tunneling and help somewhat with sub-threshold leakage. However, they do not help with dynamic leakage, which must be attacked with architectural-level solutions in my opinion. They also do not help miscellaneous other types of on-chip leakage such as lateral tunneling, junction leakage, impurity traps, etc.
at 4/3/2008 10:36:43 AM, Dave J said:
EDA has always been "broken" and, as long as the technology continues to evolve the EDA tool designed last year is going to be difficult to wield effectively on this year's problem. I also agree that IP is the way to go. I have less enthusiasm for virtual prototyping as a solution specifically because, as Harry states, verification problems are growing faster than O(n), meaning your simulator/prototype, if it is getting faster linearly, is going to fall behind. What can keep up, though, is careful decomposition into subsystems with thoughtfully defined, strictly adheredto, and easily verifiable *interfaces*. Reducing the verification burden is a big chunk of what the IP business is about. (or should be about)
at 4/4/2008 11:01:46 AM, Matt S said:
Hi Steve:
Nice wrap on Rhine's comments. Just one aside - your opening quote is only a "new" Dylan song in relative terms. It was released in September 1989 on the "Oh Mercy" album (and it even have been a vinyl/CD release back then).
at 4/4/2008 3:14:02 PM, Steve Leibson said:
Matt S, thanks for letting me know how out of date my Dylan knowledge is. I only heard this song last year for the first time, sung live by Dylan at the mid-state fair rodeo arena in Paso Robles. I haven't bought records for a while, I guess.
at 4/8/2008 12:41:04 AM, Robert N. Blair said:
First - great piece Wally.
As oner person above reminded us - EDA was always broken - it has just been a question of how badly and how engineers have been able to paper over the problems. But Wally is quite right to re-emphasise the matter at 45nm - the depth of the cracking is getting deeper, and is at risk of breaking in two - or three - or .....
Which leads me to a point that Wally may have chosen to overlook in his otherwise great piece - analog EDA is totally broken - in this new found digital world. Remember how we started with separate analog chips and digital chips - remember the 709 and the 7400?. We moved through MSI to micoprocessors to ASICS to the SOC - the magical all in one chip.
The process technology has continued on Moore''s law - as it of course should, and EDA has continued on the square root of Moore''s law at best - as the most informed folks sort of expected.
Analog design with digital EDA tools is not really a technical or economic option at 45nm. And so I would suggest that we have reached a new Y junction in the IC design road map - with our broken tools in hand. The left fork sign says continue to use digital EDA tools to design digital functions at 45nm. The right hand fork sign now says forget the digital design tools - design analog functions the old fashioned way - by doing the physical layout at 130nm, and then put it on a separate analog chip - not on the 45nm so called SOC chip. We are therefore unfortunately, or fortunately depending on whether you are a design engineer or a production engineer, back to where we started - separate analog and digital chips for optimum system design, time to market, and design cost. No surprise really - physics has a habit of prevailing.
In fact we are intending to found a new analog company based on this strategy - any believers out there care to help us?
at 4/9/2008 9:01:13 AM, Stefan Doll said:
I never quite understand people''s fascination about C. It''s a low-level programming language which allows to write high performance code. It''s not a higher abstraction level then Verilog, it just doesn''t understand timing doesn''t have a way to represent highly parallel systems and design hierarchies. Instead it provides completely useless features for design, like dynamic memory allocation and pointers.
If you use it to represent RTL, then all you have is a more awkward way to work on the RTL level. If you want a higher abstraction level you need to come up with a tool which synthesizes more abstract concepts - it doesn''t really matter which language that is - most of the available languages support abstraction concepts which the tools can''t handle.
at 4/9/2008 2:58:22 PM, Grant Martin said:
Stefan....you are right about C not being a good hardware language. This is why anyone wanting to use C to model hardware usually ends up using C++ often supplemented by specific class libraries to represent HW aspects, such as concurrency, bit manipulation etc. Witness SystemC. However, there is also a significant difference between modelling systems and HW, and designing HW. Most people are not talking about designing HW at the RTL level in C (which as we all agree, is a non-starter). Rather, it happens to be a widely used notation to capture algorithms, which as Steve Leibson discussed, may run on embedded processors, OR (as Wally Rhines advocated) may be turned into HW via high-level synthesis tools like Catapult C. The issue there is not whether C is good to represent RTL; it is whether current generation behavioural synthesis tools like Catapult C get good quality of results in creating HW implementations out of algorithms captured in C. Such synthesis tools must convert sequential C code into good HW that is efficient and exploits the right levels of concurrency; whether the current generation of tools (including Mentor competitors such as Forte Cynthesizer) are good enough to be very useful needs a lot more discussion, although reports from companies like NEC and Furiyama-san of Toshiba at DATE 2007 seem to indicate progress is being made.
at 4/15/2008 3:36:17 AM, Shawn McCloud said:
This is an old argument in the hardware-software partitioning "tug-of-war," namely that it is better to run algorithms on embedded processors than in dedicated hardware. The reality is that a 100MHz 5Watt FPGA beats (hands-down) a high-performance 2GHz 50Watt general-purpose processor on application-specific signal- and video-processing tasks. On the other hand, a hardware solution (ASIC or FPGA) does not have the flexibility of a software solution. In practice, the two approaches are complementary -- hardware blocks are placed next to the embedded processors to more efficiently (less power, faster, cheaper silicon) perform certain tasks such as video decoding, decryption, etc. For ASICs, the power consumption may be less than 1Watt, occupy minimal board space, is harder to reverse-engineer than the software solution, and given the high production volume, may cost about $1.00--all of which are beneficial. The partitioning choice is of paramount importance, and often is the factor that makes or breaks a project based on cost, time-to-market, or power and area concerns.
To back up Wally's point for using C++ to implement hardware, Catapult C Synthesis is a total solution based on industry standard ANSI C++ and combines numerical refinement, hardware architecture optimization, interface synthesis, and a closed loop verification to easily verify the RTL output as well as tap into existing SystemC verification platforms. In HDTV and next-generation wireless, where the performance per watt cannot be achieved by software alone, Algorithmic c++ Synthesis is the difference in finding a differentiating solution ahead of the competition.
--Shawn McCloud, Product Line Director - Catapult, Mentor Graphics.
at 4/15/2008 7:51:47 AM, Steve Leibson said:
Shawn, thanks for the comments and for the advertisement for Mentor's Catapult C, and especially for having the honesty to clearly identify your position and therefore your leanings. I think what you say is basically correct except for the part about "hardware blocks are placed next to the embedded processors to more efficiently (less power, faster, cheaper silicon) perform certain tasks such as video decoding, decryption, etc." That statement represents an old, time-honored, and now outdated approach that was true before the era of configurable processors but for ASIC design it is now far more efficient (power, performance, and price or silicon cost) to move the specialty computation hardware into the on-chip processors' execution units and keep the specialized, task-specific operations under direct control of the firmware running on the processor. As Tensilica's Technology Evangelist, I've been banging that drum for 7 years, with a growing body of solid positive results across a very wide application spectrum from multimedia consumer products including cell phones and printers to the biggest networking equipment sold.
Post a comment