Instigating a Platform Tug Of War: General-Purpose GPUs, Going Beyond Graphics
This blog post references my cover story, 'Instigating a Platform Tug Of War: Graphics Vendors Hunger for CPU Suppliers' Turf' in the October 13, 2005 edition of EDN.
Take a look at the print article's Figure 2c (link is to a PDF). Now consider the fast bidirectional PCI Express bus linking the CPU and GPU. Raise your hands….how many of you are thinking 'general-purpose coprocessor' or 'DSP' right now? Congratulations; I commend you on your vision.
Why are GPUs so much more powerful than CPUs, at roughly similar die sizes, when crunching through mathematics calculations? Although, as the print article points out, the contenders are ever-slowly-but-steadily encroaching on each other's turf (with GPUs, among other factors, now software-programmable, and in latest-generation iterations, even supporting shader instruction branching) the fact remains that a GPU at its heart is an application-tuned streaming media processor. This means that GPUs, for example, don't need large on-chip caches, and can instead devote that silicon area to additional computational circuitry.
Don't underestimate the importance of this differentiation….it's why I put Figure 2a (link is to a PDF), the die shot of the dual-core Pentium 4 CPU, in the article. See all those sections of the CPU that look like regularly repeating farmers' fields? Most of that's cache. Another important memory-related differentiation between CPUs and GPUs is that whereas in the CPU case there's a complex multi-DIMM link between the processor and main memory (in the Intel case, also including an intermediary DRAM controller in the core logic chipset's 'north bridge'), GPUs have a much simpler (therefore faster, and wider) point-to-point processor-to-memory hookup. ATI's recently-introduced X1800XT GPU, for example, touts a 256-bit wide frame buffer interface running at 750 MHz, to double-data-rate GDDR memory (translating to 1.5 Gbps of per-pin peak bandwidth).
Before you get too excited, though, realize that the 'G' in GPU stands for 'graphics'; these devices will for the foreseeable future remain graphics- and otherwise imaging-optimized devices (Nvidia's Chief Scientist David Kirk was quite adamant on this point when I asked him about it at the conclusion of his Hot Chips keynote address), and that fact will limit their general-purpose applicability. To that point, after you check out the IEEE Computer and IEEE Computer Graphics and Applications article links at this blog post, make sure you read through Stanford's paper published at the Eurographics Graphics Hardware 2004 conference. The inability of GPUs to do rapid, efficient random accesses to/from texture memory (a feature that's not generally needed in graphics applications), coupled with a deficit of on-chip cache memory that would otherwise hide the long access latency, is just one example of what I'm talking about here.
GPGPU (quick aside: the GPGPU site is an outstanding resource that I highly recommend as a jumping-off point for anyone interested in continued research on this topic) reminds me a lot of the C-on-FPGA (aka reconfigurable computing) experimentation that I spent my first eight years at EDN following. In both cases, there's tremendous potential for performance improvement, and for cost and energy consumption savings. In both cases, though, there's a tremendous 'paradigm shift' (I hate that term, but nothing else comes to mind at the moment) that'll have to occur for the potential to be fully realized, and until then there's tremendous difficulty in force-fitting software originally architected to run on one silicon platform (a CPU) to instead run on a different platform (as software on the GPU, or as logic gates on a FPGA). As a result, in both cases, the development activities today remain predominantly the bailiwick of academia, although if you re-read David Kirk's Hot Chips comments at the beginning of the print article, it's clear that sooner or later the GPU vendors expect (or said another way, are dependent upon) the concept to go mainstream. And to that point, one key difference between FPGAs and GPUs is that the latter technology has 'crossed the chasm'; i.e. shader-based GPUs, poised and ready to leverage beyond-graphics applications, are now pervasive in PCs.