Leibson's Law: It takes 10 years for any disruptive technology to become pervasive in the design community. This blog is about the disruptive technologies that either have or will win over electronic engineers, some that won't, and why. Please feel free to link to these blog entries! Written by Steve Leibson, a marketing consultant specializing in lead generation and content creation for high-tech companies, former VP of Content for Reed Business, and former Editor in Chief of EDN. See my consulting Web site at www.sleibson.com and my history site at www.hp9825.com. You can email me at steven.leibson followed by the magic email symbol @ followed by att.net.
Apr 12 2009 1:58PM | Permalink |Comments (4) |
Patrick Groeneveld, Chief Technologist at Magma Design Automation, gave a terrific keynote speech with great technical eloquence about exploiting parallelism for EDA tools at last week’s Electronic Design Processes Workshop in Monterey. Everyone’s interested in trying to use multicore processors to speed up the process of designing chips and Patrick’s looked into the problem. As soon as you look in this direction, Amdahl’s Law rears its head. Algorithms are overwhelmingly written as single-threaded task chains and Amdahl’s Law tells you that you the parts of the algorithm you can’t parallelize will limit the amount of speedup you can get.
Patrick converted Amdahl’s Law into a resistor network for his talk. Every task is a series resistance. When you parallelize a task, you add several resistances in parallel with the resistor that represents the task your parallelizing, which drops the effective resistance of that resistor but leaves the rest of the resistors alone, like this:

Then you can look at the cold, hard, unhappy numbers—and Patrick has. Even if you can parallelize 95% of the sequential tasks and make them infinitely fast, the best you can get is a 20x speedup. That would be great for a lot of IC-design tasks, but there is overhead associated with parallelization—primarily setting up the parallel subtasks and then merging the results. For example, consider parallelizing some image-processing tasks for a photo. First, you slice up the photo into regions. That takes time. Then you hand out each region to a processor and let it process that region. Then you reassemble the results, knit the photo back together, and fix up any problems with the boundaries between the split-up regions. All of these set-up and tear-down tasks take time that wasn’t in the original single-threaded code. EDA design is just like this, but even harder due to the complexity.
As a result, says Patrick, the real effective speedup appears to be about 2x to 3x and it doesn’t take many processors before you’ve gotten all the benefit you’re going to see from parallelization. In the real world of his customers, says Patrick, people prefer to wait two days to get a good chip design rather than get a poor one in a couple of hours.
I asked Patrick a biased question. I know that it’s possible to write C code that smoothly passes through a compiler’s optimizer. (I co-wrote an article with Tensilica's Senior Director of Software Engineering Dror Maydan about this a few years ago: Optimizing C programs for embedded SoC applications.) It’s also possible to write poorly organized source code that the compiler cannot optimize. So I asked Patrick if the same wasn’t true about chip design. Would it not be possible to design chips so that the physical layout and design of the chip was more easily parallelizable?
Yes, said Patrick, it is. Creating a well-organized chip with well-defined block boundaries helps a lot. Making sure that every block’s inputs and outputs are registered also helps a lot. However, said Patrick, EDA vendors must develop tools that accept any sort of design—good, bad, or indifferent—because EDA customers do not want the EDA vendors telling them how to design chips so that they can pass through the tools quicker, more easily, and with better results. Aye, there’s the rub. Neither do C programmers, as it turns out.
Here are Patrick’s secrets of parallel programming:
All is not bleak however. Here are Patrick’s solid secrets for faster chip design without parallelization (by reducing the number of times through the synthesis/analysis loop):
Related entries in: Design Strategies | EDA | SOC |