Advertisement

Zibb

EDA GRAFFITI, WITH EDA VETERAN PAUL MCLELLAN, DIGS INTO THE WORLD OF DESIGN TO FIND OUT HOW WE GOT HERE, WHERE WE ARE GOING, AND WHY EDA IS DIFFERENT.



   Advertisement

Thursday, July 30, 2009

DAC: denial computing

Jul 30 2009 12:00AM | Permalink |Comments (7) |


I went to the keynote today by nVidia’s (and Stanford’s) William Daily. The topic was the end of what he called denial architecture and the rise of throughput computing. Denial architecture was so called since it denied two things: that the world was sequential and that memory was flat. Throughput computing turned out to mean, surprise, surprise, the type of engines produced by nVidia.

As everyone knows the performance of a single processor is increasing only slowly due to power considerations. Instead we have to take our increased computing power in the form of additional processors. Architectures like this, such as nVidia’s chips, should continue to increase at about 70% per year for the foreseeable future. That is what I like to call “core’s law.” The number of cores on a chip is increasing exponentially. It’s just not all that obvious yet since we are still on the flat parts of the exponential curve.

Daily had some interesting analysis of the energy required to do a computation (such as a floating point multiply) versus the energy required to move the data a short distance, across the chip or off-chip. The bottom line is that computation is very cheap in both area and energy provided the data required is local, already close to the computational unit. When a lot of data is used in any sort of pipelined computation, where the output from one stage is immediately consumed by the next, then cached memory is a particularly bad architecture, something I’d never realized before. Writing the data out causes the cache-line to be fetched, then the data is read once. Finally, the value, which will never be used again, is written back to the main memory.

To take advantage of all this compute power, the programmer has to worry about managing the concurrency and worry about which memories are used to store which data. Programmers like to deal in abstractions which is why sequential programming and flat memory work so well. There are only 3 numbers in computer science, 0, 1 and infinity. Numbers like 50 processors each with 2K of memory are not something that the programmer wants to have to worry about.

But it seems there is no choice. The CUDA programming architecture gives a framework for writing these kinds of programs and certainly some of the results on computationally expensive algorithms are impressive. Done right, it is a one time cost to get back onto the performance curve as process generations unfold into the future. But it seems more like assembly language programming in some ways, since so much of the details of the hardware have to be taken into account. Chips like nVidia’s (and IBM’s cell architecture used in the Playstation) are notoriously hard to deal with because of this mismatch between computational resource and the programmer’s mental model of what has to be done.

This stuff is now being taught in universities so it will be interesting to see if a new generation of programmers who think this way find it any easier. It still seems really hard to take a lot of small computers and put them together so that they behave as one really huge one. But the payoff when it can be done is enormous. However, getting the software right continues to be the biggest problem in software.


Reader Comments



at 7/30/2009 9:35:03 AM, Daniel Payne said:
Paul,
Interesting points from NVIDIA. In EDA we as Nascentric use the GPU to speed up their FastSPICE circuit simulator by 4X however the company is now defunct.



at 7/30/2009 9:36:47 AM, Daniel Payne said:
Oops, meant to type, "we saw Nascentric".



at 7/30/2009 10:41:29 AM, Joao Geada said:
Interesting talk, but also very interesting to contrast it with Tim Mattson's talks (other conferences) on the same subject.

But I believe it is important to note that CUDA is applicable to a limited set of problems that are (simplifying) math dominated with relatively regularl structure. Compilers, logic simulators and whole classes of EDA tools would see limited potential for this technology. NOTE that CUDA compiler itself has not been CUDA'ed for these exact same reasons.

And NVIDIA still has not addressed the inherent software distribution issues with this technology: not all machines in a server farm have graphics cards, in fact (up until now?) high end graphic cards tend to be one of the components first removed from servers to lower their power and cost footprint. And retrofitting entire server farms to have this class of card is unlikely in the near term. Classic chicken/egg problem.

This is interesting technology, but, in my opinion, making blanket 30-100x speed up claims strays a bit too far over the marketing line.




at 7/30/2009 3:59:21 PM, garydpdx said:
Thanks, Joao! Based on first the abstract and then the reports from the actual Daily talk including EDN's, it seems that the professor was basically promoting a sort of VLIW but not in those words. And with each processing element, as an example GPU, their specialized architecture means targeting a certain class of problems, not general ones which so-called 'denial' architectures and now multicore (i.e. MIMD) address. If a chip of that sort is good for DSP, what happens if the user problem to be solved is ... search?



at 7/30/2009 5:21:23 PM, John Busco said:
Nice summary. Thanks, Paul.

Minor correction: he is Bill Dally,
www.nvidia.com/object/bio_dally.html



at 7/30/2009 7:03:12 PM, Sumit Gupta said:
Here are some links about CUDA and EDA:

www.ece.tamu.edu/~sunil/papers/gpu-mctime-acisc08.pdf

www.ece.tamu.edu/~pli/publications/ICCAD08_GpuPG.pdf

www.ece.tamu.edu/~sunil/papers/gpu-fs-dac08.pdf

www.nascentric.com/omegasim_gx.html

portal.acm.org/citation.cfm?id=1542275.1542357&coll=ACM&dl=ACM&CFID=46384459&CFTOKEN=29518040

www.home.agilent.com/agilent/product.jspx?cc=US&lc=eng&ckey=1297143&nid=-34278.0.00&id=1297143



at 7/31/2009 8:26:31 AM, Mike Ehlert said:
"...getting the software right continues to be the biggest problem in software."
Profound words. MS has bug fixes on top of bug fixes. While as hungry for increased comuting power at lower energy as any one we do have to get it right and get it right the first time released.

One would do well to make the software help the programmer with these tasks, much quicker and more secure than waiting to see how a new generation of programmers deal with these problems. Lets all try to eliminate the blue screen of death and other errors.


Post a comment



Display Name

Change Image
Before submitting this form, please type the characters displayed above.
Note the letters are NOT case sensitive.


ADVERTISEMENT

©1997-2010 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy