Instigating a platform tug of war: Graphics vendors hunger for CPU suppliers' turf
In his keynote at mid-July's IEEE Hot Chips conference (www.hotchips.org), Nvidia's chief scientist David Kirk made a startling public confession, validating a trend that graphics-industry observers had long suggested was under way. Kirk stated: "As pixel/vertex/triangle growth slows and plateaus ... "(Reference 1). Translation: The increasingly complex treadmill of 3-D application GUIs (graphical user interfaces) is losing steam, although Kirk was quick to point out that other performance-consuming graphics factors, such as the number of color samples per pixel (antialiasing) and the number of calculations per pixel and per vertex, will take up some of the slack.
What factors are driving the graphics-complexity slowdown? With the exception of Windows XP Media Center Edition-based systems and like-minded computers tethered to HDTV displays in living rooms and dens, most Macs and PCs drive monitors with 17-in. and smaller diagonal viewing dimensions (references 2and 3). Due in part to the raster-based approach of implementing icons, cursors, text, and other GDI (graphics-display-interface) elements available in current Windows operating systems, you rarely encounter resolutions beyond 1024×768 pixels. (The Windows OS, unlike the more advanced vector-based-graphics approach in Mac OS X, doesn't enable device-independent high-quality display scaling.) Multidisplay-PC configurations haven't widely caught on. And, although incremental quality improvements require an exponentially greater number of transistors running at increasingly higher frequencies, as well as exponentially more complex software driving those transistors, those improvements are less and less noticeable to users. This scenario is particularly true with so-called first-person shooters and other fast-action genres that dominate gaming.A pessimist might interpret Kirk's words as a foretelling of ill fortunes for graphics vendors. An optimist, however, might consider that, during that same presentation, Kirk forecasts demand growth without bound for the number of general-purpose, programmable, 32-bit Gflops per pixel and, more generally, that GPUs (graphics-processing units) will become general-purpose parallel processors. Nvidia's prediction isn't merely a delusional pipe dream; recent historical developments and current platform-architecture trends both support the premise. Although the display and operating-system limitations constitute part of the reason for the graphics slowdown, another key factor is that GPU's can now process polygon data and other graphics-related traffic faster than the CPU can provide it (Figure 1). If the GPU can in the future operate at higher levels of scene abstraction, it can dodge the CPU bottleneck. David Blythe, Microsoft's software architect for Windows Graphics & Gaming Technologies, strongly supports such an approach. In his presentation at July's DirectX Meltdown 2005 conference (www.microsoft.com/directx), he flatly stated that "games are CPU-limited" and that multicore CPUs are "not a panacea" (Reference 4). In addition to making games multithreaded, Blythe exhorted his audience to "offload to the GPU," citing five reasons that this shift now makes sense:
Shader models are increasingly expressive;
memory datapaths support iterative calculations;
better data transfers to and from the CPU are now possible;
data-amplification support exists, for example, with geometry shaders; and
high-level shader languages are evolving to support new capabilities and abstractions.
GPUs are transforming into more generic coprocessors, with the evolution of APIs (application-programming interfaces) that enable operating systems and applications to tap into their capabilities setting the pace. As this transformation continues, GPUs will potentially also be able to wrest away other functions that you currently implement in software running on CPUs. However, CPU suppliers are highly motivated to keep their own upgrade treadmills smoothly running and won't allow the GPU vendors' aspirations to go unchallenged (see sidebar, "The CPU perspective"). Other specialty-function processors, too, are hoping for their turns in the limelight and view both the CPU and the GPU as competitors. And this tug of war isn't restricted to PCs; it will potentially play out in any system that includes a display.
Opening doors to change
Historically, CPUs and "graphics accelerators," as they were originally called, were quite different devices and, as such, were symbiotic partners. The CPU is software-driven and is therefore infinitely flexible in the kinds of functions it can perform. CPUs also, in the words of Computer Science Professor Emeritus H Norton Riley, historically implement a "model, deeply rooted in the von Neumann tradition ... [which] sees a program in terms of an orderly execution of instructions as set forth by the program. The programmer defines the order in which operations will take place, and the program counter follows this order as the control executes the instructions" (Reference 5). Graphics accelerators, in contrast, have long simultaneously operated on multiple pieces of information. (As Nvidia's Vice President of Technical Marketing Tony Tamasi put it in a recent presentation: "Graphics is embarrassingly parallel.") Historically, however, graphics accelerators were hard-wired state machines that took in graphics primitives and spat out rendered pixels (Reference 6).
Both CPUs and GPUs have evolved in recent years, however, and in directions that diminish their distinctions and redefine their relationship, pointing them toward a collision course. Beginning with SIMD (single-instruction, multiple-data) instruction sets, such as Intel's MMX (multimedia extensions) and subsequent iterations of SSE (streaming SIMD extensions), AMD's 3DNow!, and the PowerPC's AltiVec, CPUs could simultaneously apply a common instruction to multiple pieces of data (Reference 7). Superscalar CPUs, which could concurrently process multiple independent instructions, came next. Intel's HyperThreading feature took parallelism to the next level, enabling limited parallel execution of multiple instruction threads, and the multicore CPUs now emerging from numerous suppliers extrapolate this capability in a more generic form (Figure 2). And what of graphics accelerators? Consider this quote from Kirk's introduction to the seminal graphics reference guide GPU Gems: "We have entered the era of programmable GPUs. The graphics-hardware pipeline, which had not previously changed significantly in 20 years, was broken down to its component, hard-wired elements and rebuilt out of programmable, parallel-pipelined processors. In a hard-wired pipeline, triangle vertices are transformed and lit; triangles are rasterized; and pixels are shaded with diffuse lighting, specular exponentation, fog blending, and frame-buffer blending."
Kirk continues, "In a programmable pipeline, each of these operations is abstracted to its component memory accesses and mathematical operations. A programmer can still write a program that calculates the same results as a hard-wired pipeline ... but the opportunity presented is so much greater" (Reference 8). Another critical piece of the platform transformation, beyond the evolution of the CPU and GPU, is the enhancement of the bus that interconnects them. Multiple contending peripherals shared the bandwidth of PCI, and, in today's terms, it was as slow as molasses—32 bits and 33 MHz—for 1.05-Gbps peak unidirectional bandwidth. The graphics-tailored AGP (accelerated-graphics-port) bus's highest speed proliferation, the 8× variant, delivered 16.8-Gbps peak bandwidth but in only one direction—from the CPU to the GPU. Data flowing from the GPU back to the CPU traversed the AGP bus at much slower, 1× AGP (2.1 Gbps, or twice that of PCI) peak speeds, and upstream AGP traffic also underwent snooping for coherency and used PCI semantics.
With PCI Express Version 1, each four-signal "lane" supports simultaneous and bidirectional, 2-Gbps data transfers (2.5-Gbyte/sec raw bandwidth minus 8/10-bit-encoding overhead). In other words, it supports an aggregate sum of 4 Gbps of peak bandwidth. Common PCI Express implementations in today's PCs devote a 16-lane connection to the graphics subsystem, thereby delivering a truly staggering amount of bandwidth between the CPU and the GPU. In the past, it might have made no sense to send information to the GPU for intermediate processing, because, even though the GPU might process the data faster than the CPU could, passing the results back to the CPU would incur unacceptable latency. The move to PCI Express clearly relieves this AGP upstream bottleneck. And at the late-July Intel Developer Forum (www.intel.com/idf), presenters paved a path for the upcoming Version 2 PCI Express specification, which doubles the per-lane bandwidth yet again.
The first shift
When DVD-ROM drives—primarily for playing video DVDs—began appearing in PCs a few years ago, GPU vendors saw their first solid opportunity to break out of the graphics-only box. The appeal was particularly attractive with laptops, whose CPUs were comparatively underpowered versus desktop systems. And, because they are battery-powered, laptops were comparatively more concerned with energy consumption. Hard-wired MPEG-2 decoding is more energy-efficient than the software-centric approach. Thus, beginning with the concluding color-space-conversion step and later broadening to earlier stage operations, such as iDCT (inverse discrete-cosine transform) and motion compensation, GPUs took over most of the 480-line-resolution DVD-decoding burden. (Nowadays, this takeover includes 720- and 1080-line-resolution HDTV, as well.) The desktop-versus-laptop differentiation continues to this day. The graphics core inside Intel's 945G desktop core-logic chip set, for example, handles color-space conversion, iDCT, and motion-compensation duties, and the mobile-tuned 945GM devotes extra transistors to tackle the variable-length lossless-decoding task.
Initially, DVD-playback software was forced, out of necessity, to interrogate the graphics subsystem and find out what GPU it contained and subsequently to include numerous GPU-specific routines that reflected each chip's hardware-acceleration capabilities. For example, early Nvidia chips had fewer MPEG-2 features than their ATI Technologies counterparts. (However, this gap has closed in recent years. Nvidia GeForce 6xxx and 7xxx chips, for example, contain three dedicated video engines for MPEG-2 decoding, motion estimation, and video processing. They also take advantage of the chips' shader processors for video functions.) Integrated graphics cores within core-logic chip sets also tend to have fewer features than stand-alone GPUs. This development and maintenance quagmire eased when Microsoft unveiled its DirectX VA (video-acceleration) API in late 2000. API support for video-decoding functions is one area in which Microsoft's operating systems continue to have a solid lead over Apple's Mac OS X. (This lead is surprising, too, given the Mac's historic strength in multimedia applications.) The only Mac OS application that currently taps into the video-decoding features of systems' graphics chips is Apple's own DVD Player program. This scenario is the reason that, for example, high-resolution MPEG-2 and -4 playback at full frame sizes and without dropping frames requires a high-end, dual-CPU G5 Power Mac Apple system. On the other hand, almost any Windows-based system for sale today can smoothly display high-resolution MPEG-2 streams without breaking a sweat.
With the last few generations of ATI, Nvidia, and competitors' GPUs, and with their evolution beyond hard-wired video-decoding pipelines to more generic shader-based processors, video-format support has broadened beyond MPEG-2. First is WMV9 (Windows Media Video 9), or, in SMPTE (Society of Motion Picture and Television Engineers) terminology, VC-1. The latest iteration of DirectX VA, with supportive silicon and drivers, dramatically reduces the CPU burden when playing back a high-resolution WMV9 clip (Figure 3). The most recent format to receive graphics-vendor attention is MPEG-4 AVC (advanced video coding, or MPEG-4 Part 10, and H.264). ATI's late-September Avivo announcement brands the capability of the company's upcoming Radeon X1300, X1600, and X1800 GPUs to both decode and encode video in a GPU-accelerated manner in MPEG-4 (up to AVC), WMV9, MPEG-2, and DivX formats. Video-encoding acceleration employs the DirectShow API, just as MPEG-2-encoder chips' drivers do. Nvidia, as part of its GeForce 7800 GTX introduction in June, publicly stated that it hoped to have MPEG-4 AVC support in place in time for this year's holiday buying season, in partnership with companies such as CyberLink and InterVideo.
At the Spring 2004 Intel Developer Forum, Pinnacle Systems (now a division of Avid) co-delivered two presentations—one with ATI and the other with ATI and Intel—that highlighted key advantages of PCI Express over AGP in high-definition video-editing applications (Figure 4). A typical scenario involves the editing and merging, on the GPU, of multiple-source video streams. (These video streams are both compressed and uncompressed; the uncompressed streams require as much as 250 Mbytes/sec of bus bandwidth per stream.) The video streams reside on the system hard drive, in main memory, and on a connected high-definition video camcorder. The GPU sends back the resultant final product to the CPU for archiving. AGP's limited upstream bandwidth is a bottleneck—one that PCI Express removes—in this final step. GPU-accelerated video encoding is attractive in such a scenario. It's also attractive in PVR (personal-video-recorder) applications and when you're transcoding and streaming video over the LAN (local-area network) or WAN (wide-area network) to a network appliance that doesn't support the source video's attributes. DirectX VA is currently a video-decoding-centric API. However, Microsoft's Blythe says, "There will definitely be future support [for API-enabled video-encoding.]"
The next frontier
Simplistically speaking, GPUs have been processing images as long as they've been handling MPEG-2 decoding. Proper playback of DVDs requires that the GPU scale the 480-line video (standard or wide-screen aspect ratio) to the resolution and dimensions of the output device. It also requires that the GPU deinterlace the video to match the progressive-scan characteristics of the display. Initially, deinterlacing employed relatively crude "bob" and "weave" algorithms, but, as the capabilities of both CPUs and GPUs have improved over time, the deinterlacing algorithms have become increasingly sophisticated (Reference 9). With the latest generation GeForce 7800 GTX GPU and its shaders' performance potential, for example, Nvidia now claims to tackle spatial-temporal deinterlacing of high-definition MPEG-2 sources, such as high-definition video and HDTV. Both ATI's and Nvidia's product literature mentions other image-processing functions the chips can perform, such as postdecoding deblocking and suppression of other lossy-compression artifacts, which are particularly attractive with low-bit-rate streaming video; random broadcast-noise removal; color intensification; and more.
Creative Labs, in May 2003, introduced a visionary graphics card based on subsidiary 3Dlabs' VP500SE GPU: the Graphics Blaster Picture Perfect, a card that—as is usually the case with visionary products—hasn't been particularly successful in the marketplace. According to the vendor, the card includes a suite of software from ArcSoft. The suite includes PhotoImpression, which offers a variety of advanced image-editing features and special-effects filters running directly on the Graphics Blaster Picture Perfect. It also includes Panorama Maker, which combines horizontal, vertical, or tiled sets of images to create panoramic photos. Panorama Maker uses the VPU (Virtual Processing Unit) on the Graphics Blaster Picture Perfect to increase the stitching speed. Finally, the suite incorporates PhotoPrinter, which allows multiple image printing on one page and multiple-page printing at one time (Reference 10).
The Graphics Blaster Picture Perfect lacks the support of a leading image-editing program, such as Adobe Photoshop. More generally, only the customized ArcSoft programs can access Graphics Blaster's acceleration features. Broader industry adoption requires API support within the Windows operating system, which Microsoft has yet to deliver. With regard to API support for still- and video-image-editing acceleration on the GPU, Blythe says, "Those are big, big, interesting things to us. And they seem like pretty natural things. Once you've got a programmable pixel-processing pipeline, video and still images are just blocks of pixels, as well, so why not? The way that I think of those is that they're just another stage, at least in the video-decode path. After I produce these images that have been deblocked, scaled, color-corrected, and all that kind of stuff ... well, I can do more processing on them, and it's just a buffer of pixels that, for all practical purposes, you can think of as being like a texture map for 3-D graphics. And you can do an arbitrary amount of pixel-shader processing on it."
Unlike Apple, which provides the Core Image library, Microsoft so far has no comparable offering, according to Blythe. "We've been focusing on the high-level programming-language capability to be able to express these kinds of operations in the programmable shading languages, without providing any sort of packaged set of operations. But we're definitely looking at those sorts of technologies. And we see this as an opportunity to broaden the use of the GPU for other kinds of applications beyond video decode and playing games." As Blythe's comments point out, although Apple may lack Microsoft's third-party-accessible API for video decoding on GPUs, Apple has rolled out an image-editing API, Core Image, in its latest OS 10.4 Tiger operating-system release, which it began shipping in late April. Approximately 30 applications already support Core Image, according to Wiley Hodges, Apple's senior product-line manager for developer products. Apple also offers the Core Video API. "In Tiger, Core Video is basically Core Image applied to a sequence. So, there's not really a fundamental difference," explains Hodges.
According to the Core Image portion of Apple's Web site, the company bundles 100 Core Image units with Tiger, along with Core Image Funhouse, a front-end demonstration program. Hardware acceleration requires a pixel-shader-based GPU, but the Web site explains that for computers without a programmable GPU, Core Image dynamically optimizes for the CPU, automatically tuning for Velocity Engine—that is, AltiVec support—and multiple processors as appropriate (Reference 11). Says Hodges, "For a developer, Core Image allows someone who might not be particularly expert in the field of image processing to add that capability into an application that might otherwise be too expensive or difficult to take advantage of it. For people who do know what they're doing, it's a great way to provide access to a standard set of effects and filters." According to Hodges, effects are written in a high-level language that's based on a subset of the OpenGL shader language, and Apple will provide just-in-time compilation of the effects for the target hardware on which an application is executing.
"Effects can run in the CPU or the GPU," Hodges says, "and one of the interesting things is that, while it's true that GPUs have enormous amounts of parallel capability, they are not always good at every possible kind of effect or transformation. And so we can look at what something requires and execute it where it's going to most optimally execute." When doing Core Image compilation, does Apple take into account 24-bit floating-point support in the ATI GPUs now in Macs versus the 32-bit floating-point capabilities in the last three generations of Nvidia chips (5xxx, 6xxx, and 7xxx)? "We do [take them into account], but there's no effect on the ultimate end result," says Hodges. "It means that, for the same level of quality, you might not get the same performance on ATI as on Nvidia. We definitely see better performance of complex Core Image effects on the high-end Nvidia cards right now."
Even in the absence of Windows API support, several video-software developers have concluded that the benefits of GPU acceleration are compelling enough that they've taken the early-adopter plunge and implemented some degree of support. According to Giles Baker, senior product manager at Adobe, "Taking advantage of the increasing power of GPU processors is a key area of focus for Adobe's image-editing and video products, both now and in the future." He claims that, as the power of GPUs continues to increase more rapidly than host CPUs, more and more of Adobe's users are simply upgrading their graphics cards to increase the overall performance of their systems. Doing so provides a more cost-effective way of keeping systems up to date.
"In Adobe Premiere Pro [and Premiere Elements], we use the GPU directly to provide a number of accelerated effects that are available only if you have a capable graphics card," says Baker. Fortunately, he adds, most cards available today support these effects, so most people benefit from the features. "As HD production becomes more and more widespread, the GPU is a key technology that we can use to help move the enormous amount of data that is needed to create HD-resolution video content. As operating systems evolve, they provide lower level access to the GPU, which we can leverage to achieve levels of performance that were previously available only when using dedicated hardware," he says. With regard to GPU support in Photoshop, Baker comments that none of the plug-ins that ship with Photoshop is currently GPU-accelerated. However, he adds, "In the future, we see a real opportunity to take advantage of GPU acceleration in all our products." Feel free to read between the lines.
Video editing doesn't just involve video, of course. Videographers also often add 2-D- and 3-D-graphics-based scene transitions, text, and other "eye candy" to a clip before burning it to DVD or another archival-and-playback format. Because the graphics subsystem finds use in displaying graphics information on the computer monitor, it's logical to employ that same GPU to speed the rendering of graphical effects. These effects merge with the video in system RAM and on the hard-disk drive, so they benefit from the high upstream bandwidth that PCI Express delivers. According to Adobe's Baker, Adobe After Effects 6.5 uses OpenGL to accelerate on-screen playback of motion-graphics projects during production. "By offloading some of the processing directly to the GPU, we gain more responsive performance with less waiting. This leads to a more creative design experience that encourages experimentation," he says.
Baker continues, "Since Adobe products are all about creating content productively, this is a huge advantage for our users. OpenGL is an evolving standard, and we plan to take advantage of new capabilities in OpenGL as the technology develops." The Mac OS counterpart of After Effects, Apple's Motion, also harnesses the GPU by means of OpenGL. According to Apple's Web site, Motion is the first motion-graphics software with GPU-accelerated, 32-bit floating-point rendering for true film quality. The documentation further says that this 32-bit floating-point rendering produces fine color accuracy, eliminates banding artifacts, and even improves quality when rendering to 8-bit formats. Finally, it claims that you get great detail, quality, and range of color that automatically scales with new generations of GPUs (Reference 12). Hodges points out and the Apple Web site confirms that installing Motion 2, the latest version of the software, requires a shader-based GPU.
Games beyond graphics
Beyond rendering and image processing, what's the next application frontier in which the GPU might be able to wrest control away from the CPU, or where might yet another processor with tailored functions emerge? Blythe's Meltdown presentation points developers in three GPU-leveraging directions: skinning, noting that simple 1-to-4-bone, GPU-based skinning is already pervasive; morphing, such as GPU-calculated facial animation; and simulation of particle systems and fluids. Echoing comments Kirk made in his Hot Chips keynote speech, Blythe says, "The place where we're going, again staying close to the games side of things, is that we've got this way of putting pixels on the screen, and, over time, we got to a point where I could get enough pixels on the screen. So I then wanted to start increasing the quality of the pixels." As a result, Blythe asserts, Microsoft's next areas of focus were finding ways, for example, to better shade the pixels, have better lighting models, or do better texture-mapping operations. Now, the company is also starting to work toward getting better geometric models for characters; better animations; and better special effects, such as water, fire, and foliage. "Fundamentally, what I'm trying to do is get better looking visuals on the screen. And those are the places where the 'big bang,' or the most cost-effective improvements are going to happen," he says.
Microsoft is also looking at options for increasing the quality of the geometric complexity of characters so that they, for example, have better silhouette edges or cast better shadows. After image-quality improvements, Blythe claims that the next set of factors that are going to contribute to the quality of the end-user experience encompasses the ability to do better kinds of physical simulations, such as creating better looking fluids, and the ability to do destructible environments in which there's a true level of realism to how the objects interact. These tasks require untraditional graphics computations, such as solving both ordinary and partial differential equations and linear systems. Thus, says Blythe, it makes sense for Microsoft to look at where best to solve these kinds of problems. "If I can express them as data-parallel kinds of problems, then the GPU starts to look interesting as a place to do them. If they have more of this sort of squirrelly control-level parallelism to them, then the CPU might be the most appropriate place. And then there's the physics processor. I can't say much about it, because there's a question of how close is it to a CPU and how close is it to a GPU in terms of the kinds of processing," says Blythe.
The GPU Gems book series is chock-full of ideas for simplifying the rendering, animation, and elemental interaction of objects. Such objects might include trees and their branches and leaves; wind-blown blades of grass in a field; a head of flowing hair; water and other fluids; fire; and atmospheric effects, such as fog and smoke. The word "simplifying" is critical here; rendering smoke by separately generating each particle and calculating its interactions with all other present particles, for example, is prohibitively expensive even for today's leading-edge CPUs and GPUs. So, visually comparable, but arithmetically easier, approaches are necessary. Chapter 6 of GPU Gems, for example, discusses a fire effect that the Nvidia team developed for a demo called Vulcan at the launch of the GeForce FX 5900 Ultra GPU (Figure 5). "When we started working on the demo, we first tried two solutions that looked promising: fully procedural flames and screen-space, 2-D, distortion-based flames," writes chapter author Hubert Nguyen.
Nguyen states that the fully procedural approach consumed little memory yet created an appealing flame effect. However, to produce well-defined flames, the demo had to display thousands of particles, and processing all of those vertices and pixels put a heavy load on the CPU and the GPU. The 2-D distortion-based flames used a GPU-generated perturbation function that altered a flame shape to give it a realistic motion. The distortion involved making several render-to-texture passes and shifting 2-D texture coordinates. Although it consumed more memory than the particle-system technique—because the demo had to allocate several render targets—the chapter claims that the effect is perfect for creating candlelike flames. However, the screen-aligned nature of the effect made it sensitive to motion in general and to the camera viewing angle. (Top and bottom views required constraints, and moving toward and away from the camera sometimes didn't work well in 2-D.) Integrating smoke was also a challenge, according to the book.
"Both procedural techniques have strong advantages, but they didn't meet our goal of creating a believable raging fire with smoke in a real-time, user-controllable environment," writes Nguyen. Thus, the team turned to video-textured sprites—that is, video-based footage—to make the fire more realistic. "Although full procedural and physically based flame generation is clearly the wave of the future, some cutting-edge movies, such as The Lord of the Rings, still use special effects composed of sprite-based flames" (Reference 13). In reference to Nguyen's wave-of-the-future comment, CPUs and GPUs will inevitably evolve to better manage the complex processing necessary for full-blown, real-time particle animation as well as for other computationally intensive operations, such as real-time ray-tracing-based lighting. For example, Microsoft revealed during its Hot Chips 2005 presentation that it had added extensions to the variant of DirectX 9 running on the upcoming Xbox 360 game console to support the execution of particle-physics calculations on the system's ATI-designed GPU.
However, officials at companies such as Ageia, with its upcoming PhysX processor, believe that a compelling need will still exist for a dedicated physics processor. One example would be a game character that needs not just to be present alongside, but also to interact with and deform a particle-based object—for example, when walking through a fog bank. "While dual-core machines can demonstrate physics effects impossible on single-CPU systems, the PhysX processor brings a realism and quality of effect impossible in software alone," says Suneil Mishra, Ageia's director of software-product marketing. "While dual-core processors can potentially handle hundreds of real-time objects interacting as opposed to dozens with a single-core CPU, Ageia's PhysX processor offers tens of thousands of fluid particles and rigid-body objects concurrently. The leap in performance and quality enables a completely different level of realism and immersion for gamers, both via effects and game-play physics."
Mishra points out that GPUs have focused recently on mocking physical effects, such as fire, water, hair, or cloth, with clever visual imagery that has limited dynamic interaction or motion. He further points out that, although convincing, these cases are limited, as are other GPU forays into coding small parts of the physics-simulation pipeline. Ageia's physics processor, he claims, handles thousands of real physical objects interacting throughout an environment, allowing gamers to experience true physical reality across and throughout dynamic game levels. Here, things not only look right, but also act and feel right. "While we fully expect GPUs to continue to grow in performance, allowing them to take on more of the most basic physics operations, support for advanced physics effects and interaction will continue to remain beyond GPUs, simply due to their chip architecture," says Mishra. Ageia's PhysX processor aims at accelerating the needs of physics-simulation algorithms. The memory and floating-point-bandwidth requirements alone cause even the most programmable GPU to fail under the load of tens of thousands of interactive physics objects, according to Mishra.
|The CPU perspective|
Talk to PC-microprocessor vendors AMD and Intel about the looming GPU (graphics-processing-unit) competitive threat, and you get somewhat divergent perspectives. "I've seen the same sort of thing in servers with TCP/IP-offload hardware," said AMD's Steve Demski, product manager for the microprocessor business unit's server and workstation business segment, at the early-July Siggraph conference in Los Angeles. "I think it's a good thing. Anything that can be done better, faster, and cheaper in the GPU ... that's fine. That just frees up the general-purpose CPU to do different things."
But would a transfer of power for some functions from the CPU to the GPU negatively impact AMD's CPU business in the future? "Look at the operating system," Demski responded. "Operating systems are multithreaded today. They will run better on a dual-core CPU." He pointed out that you will still have multiple applications: background tasks, pop-up blockers, virus protection, and more. "There are plenty of applications wanting that CPU, wanting even the dual-core CPU. If you can offload some of the more graphics-intensive things to the GPU, you're just going to speed up the overall system," he added.
"Our company focuses on what the customer needs and what the end-user experience is, and if that means working with better graphics cards and graphics-card companies, we will work with all of them," chimed in AMD public-relations manager Scott Carroll. "We don't compete with those guys; we compete with Intel," he added. Carroll pointed out that AMD removes the system bottlenecks as much as possible—for example, as the company has done with the Opteron direct-connect architecture.
What's Intel's opinion on the role of the GPU? Regarding Intel's support for video formats beyond MPEG-2 in core-logic chip sets, Patrick Smith, a graphics architect for the company, says, "At some point, we'll need to implement more and more of the decode architecture, primarily to support emerging usage models." By "emerging usage models," Smith is referring to the VIIV-branded digital home, for example, which combines a central hub, the digital server, with comparatively "dumb" media-playback devices scattered around the home. How does Intel reconcile the need for increasing amounts of dedicated video- decoding circuitry with the increasingly powerful CPUs on Intel's road map? "CPU horsepower alone isn't enough, no matter how many cores you throw at the problem," he says.
Smith is less sanguine, though, about the need for a dedicated physics processor. "Where we're doing physics today, in the CPU, there's ample compute power. I'm not sure we need to offload to yet another device and burden the system with even more cost," he suggests. He also points out the thermal issues of yet another high-powered processor in the system. The bottom line is that mainstream applications don't currently push the system and CPU hard enough to justify a hardware hand-off, according to Smith. He further admits that the graphics-chip companies are simultaneously competitors and partners; as partners, they were instrumental in co-defining PCI Express, for example. In contrasting AMD's and Intel's perspectives, keep in mind that Intel offers graphics cores in some of its core-logic chip sets, whereas AMD relies on semiconductor partners for both core logic and graphics support.
Although the GPU in the future may absorb functions that currently take place on the CPU, could the opposite scenario play out? Could the CPU re-take tasks that the GPU currently handles? Looking at the big picture, consider first the significant horsepower upgrades on both AMD's and Intel's published CPU road maps for the remainder of this decade. The processor vendors can unleash significantly larger marketing budgets than the GPU suppliers to convince customers to buy those CPUs.
Next, focus your attention on video encoding. Although the first few generations of Windows MCE (Media Center Edition)-based PCs included powerful CPUs and GPUs, they also contained dedicated MPEG-2-encoder chips to handle the PVR (personal-video-recorder) function. Microsoft wanted to ensure that, no matter what else was taking place on an MCE system at a given time, the consumer wouldn't experience dropped video frames or other recorded television blunders. In mid-August, however, CyberLink announced that its MPEG-2 software-encoder plug-in for Windows XP MCE 2005 had obtained approval from Microsoft and claimed that it drastically reduces costs for tuner-card manufacturers by avoiding reliance on hardware-chip sets when recording TV content with MPEG-2-video and -audio quality (Reference A).
Finally, look at graphics. Remember that, before the advent in 1996 of 3dfx's Voodoo 1 chip and the all-important Quake game that took advantage of it , 3-D-graphics acceleration was limited to a narrow niche of high-end workstations, along with specialized visualization applications, such as flight simulators. In the small amount of mainstream 3-D software that existed at the time, software running on the CPU fully rendered graphics primitives to pixels before their subsequent hand-off to the 2-D-graphics chip or, if the CPU also handled 2-D rendering, the RAMDAC. Now consider that, whereas the graphics core in Intel's current 945G chip set is shader-based, it hardware-accelerates only pixel-shader operations. (Engineers also sometimes refer to these operation as fragment-shader operations.) Vertex-shader code is software-emulated on the CPU. If 3-D doesn't expand beyond its rabid but minuscule gaming niche and if Kirk's predictions of slowing pixel, vertex, and triangle growth in that niche come to pass, will the processing burden shift back to the CPU, fueled by unrelenting bill-of-materials cost-reduction pressure?
This forecast may seem outlandish at first glance. But consider that, although neither Nvidia nor Sony comments on the rumor, industry insiders believe that including Nvidia's GPU (likely, a kissing cousin of the GeForce 7800 GTX) to Sony's Playstation 3 was a late-stage-development decision. According to industry gossip, as Sony originally structured the Playstation 3, it would completely handle graphics operations with its Cell processor. If the tales are true, then Sony's engineers were too optimistic this time around. But the fact that they even seriously considered a CPU-only approach to graphics says a lot about how the CPU-versus-GPU tug of war may evolve over the next few years.