Feature
Balancing in three dimensions
Graphics-chip suppliers walking a tightrope to success are encountering numerous obstacles that may cause their downfall. The best-positioned companies have diverse, flexible product lines that help you negotiate your own unique system-balancing acts.
By Brian Dipert, Technical Editor -- EDN, 4/27/2000
John Latta, president of graphics-consulting company 4th Wave, sums the situation up well: Graphics-chip architects face "a difficult dynamic," he says. They must "design to a perceived need, a shallow market demand, and no cost adder. These are the complexities of a technology in search of a market." Short-term skepticism aside, 3-D-graphics capability pervades all new desktop and workstation computers and is increasingly muscling its way into notebook platforms as well. It's only a matter of time before someone figures out how to "three-dimensionalize" the graphical user interface or finds a more mainstream killer application for the technology than Id Software's To survive until the application base expands, however, the graphics companies will have to be nimble, accurately forecasting what their customers will require 18 months to two years in the future and quickly responding to the inevitable variations from the plan that still occur along the way. They'll have to contend with application developers (struggling with profitability and time-to-market concerns of their own), who, in attempting to target the largest customer base, are loath to support leading-edge or proprietary features. They'll have to dance an uneasy tango with the microprocessor vendors, which are also trying to suck graphics functions into both their own chip sets as hardware and their host CPUs as software. And they'll have to diversify their product lines as much as they can, so that a downturn in one segment of the market doesn't lead to their demise. What's the use? The three main 3-D graphics applications—game playing, digital-content development, and visual simulation—may appear similar at first glance, but they have widely varying requirements. Traditionally, first-person-shooter and other action games have dominated the stand-alone and PC arcade genre. In these environments, high frame rate has historically been the most important feature (Reference 1). Players won't tolerate getting blasted by an alien or crashing into a barrier because the screen updates lag behind the human reflex-response time. As a result, games-targeted-graphics vendors concentrate on the all-important pixel-fill rate and take short cuts in nearly every other aspect of hardware and software design. Image-quality enhancements are a low priority. The vendors use a few large polygons to construct 3-D objects; as your character sprints down a dark hallway in search of bad guys, you wouldn't notice the fine detail on things you pass and blast. The performance-is-everything drumbeat is becoming more muted with time, however. The average size of each computer monitor isn't growing drastically, meaning that higher resolution displays don't automatically gobble up increases in graphics chips' pixel-fill rates. The vendors must find other reasons, such as true color, more accurate texture blending and multitexture application, and antialiasing, to justify both your initial purchase and subsequent upgrades to their fastest, newest, most expensive chips. At 30 to 60 frames/sec (a hotly debated and individual-specific threshold), the human eye can't detect—or therefore react to—frame-to-frame differences representing, for example, character movement. Note, though, that the longer the frame-change delay, the more it impacts the overall system response time to user input. Also, the peak frame frequency is less meaningful than the sustained average (or better yet, worst-case) specification. Game players quickly notice update rates that dip, no matter how briefly, below the players' detection threshold. Such a dip occurs, for example, when large amounts of new texture information must transfer to the graphics subsystem as a character enters a room. Games are also diversifying. Although an action or simulation program might require less than ideal quality, a slower but more immersive environment, such as an adventure, a fantasy, or a role-playing title, values accurate color, fine detail, and sharp object edges. Consider the reasons behind the success of games such as Contrast the needs of game players with those of digital-content developers, such as CAD engineers and special-effects artists. In these cases, computer For example, if you were looking at the front of a 3-D representation of my house, all polygons that represent the back and, to some extent, the sides of the house would be invisible to you. By analyzing the direction of the back-facing polygons' The third major 3-D-graphics application, visual simulation for use in flight simulators, military training equipment, and the like was the first consumer of 3-D-graphics technology, and today it continues to push the state-of-the-art. Like arcade games, simulation systems require high sustained frame rates, as well as (from an overall system perspective) near-immediate real-time response to users' input actions. Like digital-content creation, the 3-D simulation environment incorporates myriad polygons to ensure fine detail and spans a range of polygon sizes. Unlike games and digital-content creation, though, visual simulation does not assume that 50% of polygons are back-facing. Consider, for example, the percentage of total polygons visible in a downward-looking aerial view from an airplane. Visual-simulation applications, especially with dynamically created worlds, also cannot assume that polygons combine in tidy, fast-rendering triangle and fan structures. Although important differences exist in the major 3-D applications, each leverages developments made in the others. Gaming-targeted chips are incorporating quality features that first appeared in content-creation systems, for example. And the simulation world is adopting technology born in arcades. Years ago, the US Army commissioned the development of a custom version of Atari's Partitioning the pipeline If Gordon Moore needed a case study to testify on the accuracy and power of his law, 3-D graphics would be a compelling candidate. Technology that appears first in high-end simulation systems costing tens or hundreds of thousands of dollars migrates to workstation graphics chips costing hundreds or thousands of dollars and eventually to less-than-$100 mainstream-PC graphics, all within a few short years. Triangle-setup and -rasterizing functions are making their way onto core logic, such as Intel's i810, and host CPUs, such as Intel's upcoming Timna. Meanwhile, the stand-alone-graphics-chip suppliers are making Nevertheless, by accelerating T&L in hardware, the mainstream chip suppliers are threatening the place that high-end vendors hold in the small but lucrative workstation-graphics market. Those vendors have for years been accelerating T&L in hardware. Display quality, robust application-programming-interface (API) support, and the need to pass stringent application certification paces the newcomers' progress. On the other hand, the newcomers are now in a battle for the much larger volume PC world with the host-CPU providers. One of the most obvious applications of floating-point single-instruction-multiple-data (SIMD) instruction sets, such as AMD's 3DNow!, IBM and Motorola's AltiVec, Intel's Streaming SIMD Extensions (SSE), and Sun's Visual Instruction Set (VIS), is to boost the performance of software-emulated T&L algorithms. Graphics applications have historically done this task in software under the DirectX API, although OpenGL has for some time incorporated optional hardware acceleration. Last summer's DirectX Version 7 for the first time supported optional hardware-accelerated versions of these functions, but lacking other compelling uses for SIMD and high clock rates, the CPU manufacturers won't give up their turf without a fight. Look, for example, at Intel's new Williamette processor architecture. The second-generation, floating-point SIMD engine now supports double-precision, floating-point and 128-bit-integer operations, and the ALU runs at twice the core operating frequency. Issuing the first challenges to the host-CPU monopoly on T&L, Nvidia and S3 last fall launched their single-chip hardware-T&L-enhanced GeForce 256 and Savage2000 graphics accelerators, respectively (Reference 3). With Quadro, which closely followed GeForce, Nvidia took a page from Intel's marketing book. Although Quadro is significantly more expensive than GeForce, Nvidia manufactures both from the same die, but Quadro runs at a higher clock speed and has antialiased-point and -line support in its drivers. When you look at benchmark results for Savage 2000, remember that the device's drivers, by press time, did not yet support hardware T&L due to the device's polygon-clipping unit, which is not fully functioning. ArtX, which ATI recently acquired, also accelerates T&L in hardware, and, in partnership with Acer Labs, brought hardware T&L to mainstream core logic with integrated graphics. The partnership's first product targets low-performance Socket 7-based CPUs, an ideal environment for hardware-T&L acceleration, and its 128-bit main-memory interface alleviates unified-memory-architecture (UMA) bandwidth concerns. From a purely technical perspective, the graphics vendors' claims about the appropriateness of hardware-accelerated T&L have a lot of merit and hark back to the arguments that DVD and high-definition-TV (HDTV) decoding-silicon providers also make (Reference 4). Fortunately for the graphics suppliers, 3-D graphics is a less "upper-bounded" task than DVD or HDTV decoding, meaning that the upper limits of 3-D graphics' requirements is less fixed. Continued user demands for higher resolution and greater image quality delay the inevitability that the host CPU will subsume the 3-D graphics function into software. Fixed-function, frequently executed tasks are, for cost, speed, and power reasons, well-suited for dedicated hardware; as a result, you don't have to waste the host CPU on these menial tasks. The CPU is then free for other more appropriate duties, such as accurately modeling object physics effects; implementing more-realistic-character artificial intelligence, animation, kinetics, and kinematics; managing object databases; implementing audio synthesis; and the like. For example, when Epic's Where's the beef? Reality, however, often differs from theory. The fundamental problem facing trendsetters, such as Nvidia, is that most content developers want to sell as many copies of each game as they can. So they write the game to the installed base of hardware. Think back to what was the hottest selling PC configuration approximately 18 months ago. Even if the content developers Hardware T&L integration is a given for most stand-alone, next-generation graphics chips. Incorporating it, for example, is ATI's new architecture, the Charisma Engine, which ATI announced last month at the Game Developer Conference. In non-PC applications in which the host CPU is low-performance for cost reasons, the hardware-T&L-accelerated approach makes sense. Economics is also one of the key reasons that Microsoft's Look closely at some of the benchmarking on GeForce using today's games, and you'll notice that the chip's benefits become apparent only across a narrow sliver of the system-performance spectrum (see sidebar "On your bench(mark), get set..."). Couple the graphics accelerator with a CPU that runs too slowly, and the remainder of the system will starve graphics-subsystem performance by handing it even unprocessed polygons and other data more slowly than its maximum processing rate. Couple the graphics accelerator with a high-performance CPU, particularly one with floating-point SIMD support, such as a K6-2, an Athlon, or a Pentium III versus a Pentium II or a Celeron, and graphics runs no faster in hardware-accelerated versus software-emulated mode. If you throw, for example, complex or too many light sources at the graphics chip, it may render the scene more slowly than the host CPU would otherwise do by itself. Object collisions can also be more difficult to detect if the host CPU doesn't have fast access to the transformed polygon coordinates. Note that GeForce performs much better on synthetic benchmarks, which more accurately reflect next-generation applications' per-frame polygon counts and other features. Reflecting this fact, Nvidia's marketing pitch touts the long-term investment wisdom of purchasing a GeForce-based graphics subsystem. Unless the application uses hardware-T&L or esoteric graphics features, such as spherical-environment mapping, GeForce performs only about at the level of higher end variants of the TNT2, Nvidia's previous architecture, particularly at low display resolutions. The primary limitation of GeForce is that it offers only one texture pipeline for each corresponding pixel pipeline, whereas the TNT architecture supported a 2-to-1 texture-to-pixel-pipeline ratio. GeForce integrates four parallel pixel pipelines to TNT's two. However, both trilinear-filtering and multitexturing applications, such as the layering of shadow on light map on water on blood on color pattern on rough surface, that games use results in unused pixel pipelines. Even in bilinear-filtering and single-texture-per-pixel applications, the chip's frame-buffer bandwidth can artificially constrain the maximum pixel-fill rate. Nvidia is returning to a 2-to-1 texture-to-pixel pipeline ratio and preserving GeForce's four-pixel-pipeline architecture with its new GeForce2, which is now available for sampling. ATI takes the evolution one step further, offering a 3-to-1 texture-to-pixel-pipeline ratio on its newest devices. With GeForce2, Nvidia will also convert to a 0.18-µm manufacturing process with an anticipated 50 to 100% higher clock-rate capability than that achievable on GeForce's 0.22-µm lithography. The combination of dual texture-per-pixel pipelines and higher clock rate will, Nvidia claims, give GeForce2 three to four times better peak performance, depending on clock rate, than the first-generation GeForce in advanced filtering and multitexture applications. Part of this performance comes from pipeline tuning like that in the TNT-to-TNT2 redesign. Nvidia believes that this tuning will give GeForce2 10 to 15% higher performance than GeForce even at the same clock rate. Intel and its CPU competitors are pushing the content-development community to create scalable game engines that will dynamically adapt themselves to the characteristics of each platform they run on. The companies advocate a multiresolution-mesh approach that, by default, assumes a high-polygon-count model akin to the one that artists create when developing their characters. Multiresolution-mesh techniques automatically decrease the number of polygons for lower end systems in a visually pleasing manner that maintains a minimum required frame rate (Figure 4). As system-scalable techniques become more common, they'll probably help the graphics vendors' cause. Artists will no longer have a reason to create polygon-deficient worlds. In fact, manually reducing polygon count while preserving a reasonable-quality representation of each 3-D object frequently takes up a disproportionate percentage of the time spent in today's game development. Workstation-graphics supplier 3Dlabs has come up with an interesting approach to resolving the CPU-versus-graphics tug of war. Its PowerThreads drivers use hardware to accelerate or software to emulate each OpenGL API call, depending on the graphics subsystem's capabilities and how much processing power it versus the host CPU has available. Although all hardware-T&L engines essentially perform the same floating-point vector-arithmetic functions, important differences exist between them. One disparity involves the amount of internal precision the calculations employ; high precision is less important for mainstream 3-D applications but more important in high-detail CAD work. Some T&L engines are hard-wired to a specific API, whereas others are programmable and therefore flexible enough to use on multiple APIs. Although a hard-wired approach may be acceptable for a slow-evolving API, such as OpenGL, new DirectX revisions appear yearly, making an easily evolving alternative more appealing. On the other hand, the more programmable the T&L engine, the more it overlaps with the similar function implemented on the highly flexible host CPU. T&L engines also differentiate themselves by the variety of light-source types that they accelerate in hardware. Communication breakdown Regardless of what portions of the 3-D pipeline the graphics subsystem handles, an appropriate interface channel must exist between it and the host CPU and its respective core logic. Early PCs used ISA to accomplish this task; PCI brought higher bandwidth potential and other access enhancements. As network, hard-drive, and other traffic constrained PCI bandwidth, the need for a dedicated CPU-to-graphics bus became more critical (references 5 and 6). Nowadays, as a result, most PCs and even Macs use variants of the Accelerated Graphics Port (AGP), whereas some high-end workstations employ proprietary alternatives, such as Sun's multiple-controller-capable Ultra Port Architecture (UPA). Successive revisions of AGP multiply the 32-bit, 66-MHz data channel's peak data bandwidth, from 266 Mbytes/sec with AGP 1´ to 1066 Mbytes/sec with AGP 4´, using single-, double- and quad-data-rate techniques. AGP variants also add other important features, such as sideband addressing, which, like the Direct Rambus DRAM (DRDRAM) interface, gives address signals their own dedicated pins and allows the average data-channel bandwidth to more closely approximate its theoretical potential. Pipelining lets the CPU queue multiple requests at a time. Fast Writes mode bypasses main memory, enabling direct transfer of information between the host CPU and the graphics architecture. In doing so, Fast Writes sidesteps one common system bottleneck that advanced main-memory architectures, such as DRDRAM, also attempt to solve (Reference 7). Each data transfer down AGP normally requires two CPU front-side bus transfers and three memory accesses (Figure 5). This spring's Intel Developer Forum marked the unveiling of the "Beyond APG 4´ Initiative" (www.beyondagp4x.org). "Virtual-AGP" connections within core-logic chips that include graphics functions can also run at greater-than-AGP-4´ speeds. Marketing hype aside, is all this bandwidth really necessary? Ask five graphics vendors this question, and you'll get five responses. The answer depends first on how many peak polygons per frame the CPU supplies to the graphics subsystem. Other significant swing factors are textures and how the graphics subsystem manages them. Intel envisioned, when it first came up with AGP, that the graphics chip's local memory, if any, would only find use as the frame buffer. All textures would load in main-system memory, and the graphics accelerator with a small on-chip cache would fetch them as needed over AGP, a technique called AGP Texturing mode. Intel's i740 and i740-derived i810 core-logic chip set work in this way. Low-end i810 configurations even use main-system memory for the frame buffer in a return to the UMA concept of days past (Reference 8). Intel's upcoming i815 (code-named Solano) chip set will also embed a graphics controller, but instead of using extra pins to address an optional frame buffer, the chip set will support an AGP expansion bus should the system manufacturer or end user want to disable the graphics built into the core logic. In contrast to the AGP Texturing technique, some high-end workstation graphics chips not only have a huge local-memory-derived texture cache but also dedicate pins to its interface—unwilling to share the cache's bandwidth with that of the local frame buffer. A vocal supporter of local texture caching is 3dfx; its chips don't support AGP Texturing mode. Part of this decision derives from 3dfx's significant presence in the retail upgrade market and subsequent support of PCI, for which direct texture fetching is impossible. Part of the reason is 3dfx's restricted, 16-bit-color and low-resolution-texture support—until the introduction of the VSA-100 chip architecture—and the corresponding decreased demand on local-memory density. And DRAM's unusually low prices in recent years support the cause of local texture caching advocates, too. Most mainstream-graphics vendors support an approach between that of Intel and of 3dfx, retaining the ability both to locally cache textures and to fetch them from main memory, depending on what an application requests. As the amount of texture data, which tracks the number of polygons per scene, increases and as the maximum resolution and, therefore, size of each texture set also grows, AGP-performance headroom diminishes. Perhaps the biggest consumer of AGP bandwidth, should Intel's vision come to pass, is uncompressed HDTV video content sent to the graphics subsystem as a texture. Note too that video textures, by virtue of their constantly changing status, are inappropriate for caching. An interesting approach to using local memory comes from 3Dlabs with its onboard memory-management unit (MMU). The MMU works independently of the API to download over AGP and cache only the level-of-detail multum-in-parvo (MIP) map, or resolution-dependent version of the texture, that the MMU needs at the time. How can you reduce the amount of AGP traffic, aside from locally caching textures within the graphics subsystem? One approach compresses the texture data in a lossy manner when creating the content for the graphics subsystem to subsequently decompress and display. A number of vendor-proprietary schemes exist for doing this compression. For example, 3dfx just converted its FXT1 approach along with its Glide API to open-source status. However, two years ago, Microsoft licensed an S3-developed texture-compression algorithm, S3TC, for DirectX 6. Thus, S3's competitors prefer to refer to this algorithm as DXTC. S3TC claims visually lossless 4-to-1 compression in most cases, and the industry is slowly converting to this standard (Figure 6). Bump mapping, another data-reduction technique, creates the illusion of a 3-D surface using a 2-D texture (Figure 7). Numerous bump-mapping approaches, such as embossed, dot-product, and environment-mapped techniques, exist, but, as with texture compression, the industry is slowly but surely converting—at least in the Direct world—to a single Microsoft-blessed standard. When graphics chips began adding hardware support for the triangle-setup function a few years ago, the data the CPU sends down AGP significantly decreased. Conversely, should the graphics chip perform hardware T&L and therefore cull any resulting back-facing polygons, the amount of polygon data sent down AGP will be higher than if the host CPU does the T&L and, therefore, the culling. Local-vertex caching somewhat alleviates this added AGP burden, whereas six-texture cubic environmental mapping has the opposite effect. Graphics techniques that first appear in high-end systems migrate down to the mainstream more quickly than you might expect. Sun Microsystems has taken compressed AGP traffic to the next step, developing a visually lossless geometry-compression technology, which the Java 3-D API supports. Microsoft's Talisman initiative encompassed a number of data-reducing concepts. The graphics subsystem didn't rerender polygons that remained the same from one frame to the next, and it transformed the affines of those whose orientations had changed only slightly. This concept is one that ATI's newest architecture, which supports hardware keyframing—using software algorithms to describe movement—revisits (Reference 2). Other developments in graphics-research laboratories that promise to transform your future system-design balancing act include Bezier patches and nonuniform rational B-splines (NURBs); higher levels of abstraction than polygons provide for representing 3-D surfaces. When the use of these types of parametric models becomes mainstream, it will not only speed the creation of detailed 3-D objects and shrink the amount of data needed to describe them, it'll exacerbate the graphics chip-versus-host CPU tug of war for who gets to process them. Now that you know about the polygon and texture data that flow What makes the pixel-fill rate so important? Increased frame rate is one factor. Higher resolution also exponentially increases the amount of information transferred during each frame; a 1024´768-pixel XGA display has 2.5 times more pixels than a 640´480-pixel VGA alternative. Make each of those pixels and therefore the textures defining them 32-bit true color, and you further increase the required bandwidth over a 16-bit display. And increase the per-pixel precision of the depth buffer, from 16 to 24 or even 32 bits, and required bandwidth further expands. Fortunately, a floating-point (versus integer) depth buffer or the alternative W-buffer technique, which provides a more linear representation of pixel distance than the nonlinear Z-buffer approach, defers the need to migrate to a higher precision depth buffer. Finer resolution depth precision implies a more complex scene with numerous objects overlapping each other. Thus, for each object, you read back the appropriate current pixels' values, modifying them with incoming pixel details (based on relative Z-values and alpha, or translucency, information) and writing them back, all before you can display the scene. Multitexturing consumes memory bandwidth both to fetch the MIP map information and to read back, modify, and write the appropriate pixel contents in the absence of multiple parallel-texture pipelines within the graphics chip. Such behavior reveals one advantage of 32-bit color, which experiences less accuracy degradation through multiple pixel-modification passes than the lower precision, 16-bit alternative. Advanced texture-blending techniques, such as trilinear and anisotropic filtering, also can degrade performance. And LCD shutter glasses double the required frame update of a monitor. Is it any wonder, then, that graphics—more than any other application—pushes commodity-DRAM speed? Specialized graphics-memory variants both offload the graphics logic from some of its processing and lower the required logic-to-memory bandwidth and have in some cases achieved reasonable success. Synchronous-graphics RAM, for example, supplements SDRAM's features with write-per-bit and block-write capabilities. And even more esoteric architectures, such as Mitsubishi's 3D-RAM, embed arithmetic-logic units and other processing elements to reduce or eliminate the common read-modify-write and other functions (Reference 9). Just as vendors assume best-case conditions when specifying polygon geometry and rendering speeds, they play similar specmanship games with fill rate. One or a few large polygons per frame are the norm—with little to no depth complexity or shading and no advanced texture-manipulation, true-color, or other quality features turned on. In another example of workstation graphics technology migrating down to mainstream computing, 3dfx has recently been touting the T-Buffer capability of its VSA-100 scalable graphics architecture, an approach analogous to the accumulation buffers common today in high-end graphics. T-Buffer enables one or multiple parallel-operating VSA-100 chips to render multiple versions of a frame, a capability you can use with several quality-improvement techniques. Perhaps the most compelling—and the one that requires no explicit application or API support—is full-scene antialiasing (FSAA). A single VSA-100 will support two-sample FSAA, and multiple-chip versions can implement the even higher-quality four-sample FSAA variant. Antialiasing—smoothing jagged edges at color transitions—becomes more important as the number of polygons increases. The FSAA approach that 3dfx employs involves rendering each incoming polygon to slightly different pixel locations in each of the multiple frame buffers, then combining them before displaying them on the screen. Some vendors use Calculated cloning Subdividing the rendering and rasterizing functions across multiple parallel-operating graphics chips is a viable short-term strategy for increasing a vendor's product-line flexibility, although this subdivision might contradict the long-term trend toward single-chip integration. Workstation-graphics vendors, such as 3Dlabs and Intense3D, take the next step: offering separate geometry chips. The multichip balancing act is trickier than it might seem at first glance, however. One technique, which ATI Technologies' alternate-frame rendering (AFR) exemplifies, is to subdivide the graphics task on a frame-by-frame basis. While one chip is sending its front-buffer information to the screen (and perhaps rendering the next frame in its back buffers), the other chip is independently processing the subsequent frame or series of frames, and so on. This approach reduces the amount of per-pin local-memory bandwidth necessary to achieve high frame rates. However, it doesn't resolve any frame-rate bottlenecks that front-end polygon transfer or processing limitations cause. The technique is also somewhat inefficient in its use of local memory, because each chip ends up with its own texture cache, probably sharing a great deal of redundant information with its neighbor. The opposite extreme, which 3dfx's Voodoo2 scan-line-interleaving (SLI) architecture exemplifies, subdivides each frame on a line-by-line resolution and partitions the per-frame processing among the multiple graphics chips on the board or in the system. SLI gives similar fill-rate relief to each graphics chip, and, theoretically, it also cleanly allocates the per-frame rendering task. However, practical limitations constrain SLI's effectiveness. Because most polygons, especially in low-triangle-count gaming environments, span multiple pixels and therefore multiple scan lines, a great deal of redundant processing occurs. Other graphics architectures, therefore, such as 3dfx's VSA-100-based Voodoo5 boards as well as many workstation-class products, are choosing an interim middle-ground approach. Each graphics accelerator handles a group of contiguous scan lines, which the vendor selects based on an average polygon-size prediction. Region-based, or deferred-rendering, accelerators take a different approach to solving the fill-rate problem (Reference 10). These architectures employ extensive internal caching to process all the polygons that define each pixel or small cluster of pixels before writing them to the frame buffer for display. Traditionally, API incompatibilities—specifically, the inability to read back depth-buffer data after writing it—have limited the use of the region-based technique in PCs, although such limitations are less of a problem in other platforms. However, region-based-accelerator advocate GigaPixel claims to have licked the API problem, and Microsoft seems to agree: GigaPixel's technology nearly became the graphics foundation of Microsoft's Yet another approach to solving the frame-buffer-bandwidth issue is to embed the memory array on the same die as the graphics logic with a wide, fast bus interconnecting them. To date, this approach has seen limited success; MediaQ, NeoMagic, and Trident, for example, have implemented it for mobile graphics accelerators, in which 3-D performance is typically less critical than for a desktop PC and embedded memory's low power consumption is equally valuable. Historically, embedded DRAM has been more expensive than the multiple-chip alternative, and the process modifications necessary to incorporate it have encumbered logic performance. But companies such as Bitboys Oy and PixelFusion continue to extol the benefits of the approach, along with AGP Texturing, to limit local-memory-density requirements. Others will most likely join these early adopters as more foundries ramp up embedded-DRAM capability. PixelFusion's architecture is intriguing for another reason as well: Its generic media-processor array supports functions other than graphics, and the driver can dynamically tune the percentage of available processing power devoted to front- versus back-end graphics tasks.
Author Information
REFERENCE
ACKNOWLEDGMENT Simulation-system vendor Quantum3D gave me an interesting end-user perspective on the importance of various graphics-technology features in meeting the company's application needs. At the Platform '99 and Platform 2000 conferences, presentations by Neil Trevett, vice president of marketing at 3Dlabs, were the inspiration for this article. And both Bert McComas from Inquest and Peter Glaskowsky from MicroDesign Resources were invaluable sources of information and perspective. Thanks also to the vendors that supplied hardware and software for the benchmarking project, particularly Kingston Technology and NEC for coming up with now-rare PC800 Rambus in-line memory modules. | ||||||||||||||||||||||||||||||||||||||||















Of all the bizarre technologies in electronics, 3-D graphics has got to be one of the strangest. Where else do you find dozens of chip companies (plus, in some cases, additional board manufacturers) chasing after only three significant opportunities: PCs (and related workstations), home- and arcade-game consoles, and visual-simulation systems? Within the biggest of these, PCs and workstations, only two relatively small sets of users (game players and digital-content creators) really need robust 3-D graphics, in spite of what the marketers might say. And you, the system designers, know this fact, which is why you're relentlessly driving the chip and board suppliers to deliver higher quality and faster performance (so the system specs will look good on your products' ads) at low to no cost.
