EDN Executive Editor Ron Wilson explores how IC design teams really work: the struggle for power efficiency and performance, wrestling with semiconductor processes and design methodologies, the challenges of global design teams. How do we somehow herd architecture, IP, design and verification into a successful tape-out?
Mar 28 2007 9:43AM | Permalink | Email this | Comments (0) |
Blog This! using: Blogger.com | LiveJournal |
Digg This | Slashdot This | add to Del.icio.us
Exploring the Multicore Expo in Santa Clara yesterday, I happened on a vendor called PeakStream, which provides a programming environment for all sorts of multicore architectures, from nVidia and ATI graphics processors to Intel multicore CPU chips. The company’s target is traditional high-performance computing, in which algorithms exhibit a great deal of data parallelism, and consequently most of the critical numerical operations can be treated as vector arithmetic rather than scalar arithmetic. PeakStream allows programmers to express their algorithms in C as vector operations, and the company’s tools more or less automatically map these vector programs onto whatever parallel hardware the system actually has. This hides the complexity of parallelization, thread mapping and inter-thread communications from the programmer unless she is unhappy with the performance of the resulting package.
For data-reduction applications in areas such as investment analysis or tomography this seems a more than adequate solution. Programming is simple, the vector metaphor gives the programmer a natural way to look at the system behavior without looking at the system complexity, and the performance should be adequate. Much faster than single-threaded is almost always fast enough.
But the encounter set me to wondering about applications in the embedded control world, where there are hard deadlines, and where tasks tend to exhibit little or no data parallelism. And that in turn led me to think back to one of the darker recesses of my mostly mis-invested education, where I was supposed to have learned about state-space analysis of systems.
In state space representation, all systems exhibit a large degree of data parallelism. The more state the system has, and the higher the order of the state equations, the more data parallelism exists. So in principle, a real-time control system could indeed behave much like a traditional data-parallel computing task. Just think of the system as a huge discrete-time state machine transitioning under the control of a very large but relatively simple matrix equation.
Such a representation might just be a way to make large numbers of computing cores useful in embedded applications. There are of course problems to be solved. The big one that comes to mind is the degree of intertask communication that might be required during, say, a matrix multiplication. But there are already algorithms for partitioning and scheduling matrices to minimize this. And it should be possible in most systems to map the state variables onto the matrix in a way that minimizes all but near-neighbor communications anyway.
Anyone out there working on this?
Related entries in: SOC (System on a chip) | Software |