Instigating a Platform Tug Of War: General-Purpose GPUs, Alternatives and Another Perspective
What if you need a more general-purpose co-processor for your next x86 CPU-based design? In that case, I can point you in at least two directions (aside from, of course, generic DSPs and FPGAs); Aspex Semiconductor and ClearSpeed Technology (I had a brief but very interesting chat with the latter company at the fall 2005 Intel Developer Forum). As I point out in the print article, AGEIA believes there's a tangible role to play for its chips in dedicated physics processing. And an Israeli startup named Alseek is even working on a hardware-accelerated AI (artificial intelligence) chip.
I'll close this post series with some additional insight from David Blythe, Microsoft's software architect for Windows Graphics & Gaming Technologies. In response to a question asking him to position the MMX, 3DNow!, SSE and other SIMD instruction set enhancements that CPUs have acquired, versus the GPU as an alternative vehicle for implementing various functions; "So far the feeling's been, at least on the CPU side, that's it's interesting. And a big advantage of it is that it can fit in with the sort of regular scalar processing that is more tightly integrated, but….it's not a huge amount of acceleration compared with what you could get on a high-end GPU….The latest GPUs that are coming out will have 48 four-vector floating point units that are available compared to the one four-vector, or for practical purposes two-vector, unit that might be on a CPU….The levels of performance are different. So the GPU guys have decided to invest all their gates in compute and not in caches and communications stuff. It's more limited in what kind of things you can do but when you find something that maps well into it, it's really good at it.
"So the expectation is that the GPU guys will become more general, and the CPU guys will try to increase their parallelism and try to find tasks in which they can take advantage of more gates. And….my opinion is that just dropping down arbitrary large numbers of identical Pentium 4 or AMD cores isn't going to be the best way to do this, because rather than data parallelism, what optimally gets mapped onto that is control level parallelism and that's a harder problem to solve. It's more general, but trying to find a systematic way to expose that and make use of it is difficult. And so I think the round of game consoles that are coming out and the first round of multi-core CPUs are going to tell us about how this is going to pan out. Now on the GPU side they want to generalize in order to be able to do more and more kinds of problems, but I think the important thing for them is to not give up on the data parallelism and….destroy the big benefits that they can get from that".
Further discussing the potential for GPUs to implement functions beyond graphics in the future; "The real key is, can I do a bunch of operations in parallel, is the first step. Am I doing similar kinds of processing?….What I want to do is separate the zero elements from the non-zero elements and only process the non-zero ones….The second one is, does it work well with the memory system….the GPUs tend to have much higher-bandwidth memory systems but in order to make them achieve those sorts of bandwidths, the accesses need to be fairly localized, fairly coherent. If every processing calculation that's happening in parallel goes to a random part of memory, then the memory system isn't going to work as efficiently, and that's where CPUs come in….The caches on CPUs are much better at handling this sort of random access to memory as long as the accesses all fit in the cache. The GPUs don't have these big caches, and so there's a requirement that in order to get these operations to go efficiently that the memory accesses need to be coherent".
Finally, asked why so much of the GPGPU development to date is based on OpenGL; "I'm probably in a unique position to be able to answer that, given the amount of time I spent working on OpenGL". [editor note: see here for some background on that comment] "As far as I'm concerned, they're really similar in terms of what the capabilities are, and in some ways it's necessarily so because the hardware vendors aren't going out and designing different functionality for OpenGL and different functionality for DirectX. We each may choose to expose it in slightly different ways, and there's always some phase difference about when releases come out. But I think they're fundamentally the same. And there are some cultural things that, because OpenGL is built on Linux and that lots of universities use Linux, that there's an opportunity to have more people use OpenGL on the university side, but we've had plenty of interaction with lots of people that are doing GPGPU stuff on DirectX. There really isn't much difference between the two of them".
To Blythe's API comments, I'll point out that at the recent day-long GPGPU session at the 2005 SIGGRAPH conference, the most commonly-voiced frustration expressed by presenters was the lack of visibility into low-level hardware details from GPU suppliers (for perhaps obvious competitive reasons). I'm encouraged, therefore, to witness that as part of ATI's recent X1000 family announcement, the company has pledged to publish both detailed product specifications and a 'thin' GPGPU-optimized abstraction layer. If Nvidia reciprocates, and especially if the abstraction layer can somehow be made compatible (to at least a reasonable degree) between the two suppliers, GPGPU development will inevitably accelerate.