EDN logo


Design Feature: December 8, 1994

Draw workstation graphics into mainstream PCs

Rhett Saugier,
AT&T Microelectronics

As PC-display resolutions increase and color depths bloom, higher refresh-rate monitors are becoming more common. A graphics design based on the traditional VGA architecture cannot begin to handle the data-transfer bandwidth required for such high-resolution, "true-color," high-refresh-rate displays.

Equipping a PC with a workstation's graphics architecture can circumvent VGA's inherent performance bottleneck. A workstation typically has a dual-ported, video-RAM (VRAM) frame buffer connected between a high-performance graphics accelerator (usually an ASIC) and a high-speed RAMDAC (a mixed-signal device containing a RAM look-up table, which converts digital pixel codes into analog signals for a monitor). Where a VGA graphics board feeds pixels through an 8-bit port, workstations' graphics accelerators and the RAMDACs have dedicated 32- or 64-bit interfaces. Until recently, however, this workstation-like configuration was too costly for PCs.

Now semiconductor manufacturers offer standard-product graphics controllers and RAMDACs that support a workstation-like architecture while maintaining backwards compatibility with VGA.

This article describes how to design a high-performance, true-color graphics subsystem using such off-the-shelf ICs. It explains how to address architectural, operational, electrical, timing, and cost-performance trade-offs previously unencountered in mainstream graphics-system design.

Even though these standard ICs open up new possibilities for high-end PC and low- to mid-range workstation designers, the devices also raise several new design problems:

The Weitek Power 9100 controller, for example, supports several elements of workstation-graphics design that provide the bandwidth needed for high-performance graphics. Fig 1 illustrates a typical high-performance graphics-board de-sign for a PC that uses the controller and VRAM.

A high-end graphics controller offloads graphics functions such as line drawing, pattern filling, and bit-block transfers (bitblts) from the system CPU. This off-loading leaves the host processor free to run user applications and reduces the amount of graphics data that has to move across the system interface. The controller in Fig 1 combines a workstation-style, pipelined graphics accelerator with a high-speed, frame-buffer controller. (An optional video coprocessor, the P9130, performs sophisticated color operations in hardware.)

VGA runs out of gas before getting to workstation class

VGA's primary performance limitation is its frame-buffer I/O bandwidth. At 800××600-pixel resolution, a true-color (24-bits/pixel) frame comprises nearly 1.4 Mbytes of data. To refresh the screen at the 75-Hz rate, which the Video Electronics Standards Association (VESA) specifies for flicker-free display, the graphics system must have a data-transfer bandwidth of 150 Mbytes/sec.

Writing 800×600-pixel frames to the frame buffer at the 30-frames/sec standard rate for full-motion video takes an additional 42 Mbytes/sec of bandwidth, not including overhead. High-resolution true-color PC graphics and multimedia can easily require more than 200 Mbytes/sec of data-transfer capacity into and out of a VGA frame buffer.

A typical high-end PC demonstrates how this design can give a PC workstation-like graphics performance. The example is a 66-MHz Pentium PC--a Gateway P5-66--having 16 Mbytes of RAM, a 256-kbyte cache, a Western Digital Caviar 2540 hard disk with an IDE controller, and a 2-kbyte disk cache. For this particular PC, the controller's performance exceeds 32 Graphics Winmarks at 1280×1024 pixels, 8 bits/pixel, 72-Hz refresh rate. In contrast, a VGA-equipped PC, having otherwise similar hardware, attains approximately 4 Graphics Winmarks. (Ed note: Ziff-Davis's Winbench 4.0 runs on a Weitek Power 9100 Driver video board with 4 Mbytes of VRAM using Weitek's 9100_08.DRV driver running under MS-DOS 6.20 and Windows 3.10.)

But even with an accelerated controller, true color can place a significant burden on the system bus. Although you can compress typical video data, this kind of data cannot use the drawing capabilities of an intelligent controller. Graphics controllers' built-in drawing abilities speed up only MS-Windows two-dimensional graphics. True-color video (quarter-VGA resolution) at 30 frames/sec requires about 4 Mbyte/sec of real-time data-transfer bandwidth. The data alone consume about half of the ISA bus' maximum theoretical bandwidth and requires twice the ISA's maximum DMA block-transfer rate.


Local-bus connection for video

To get around ISA's bottlenecks, you need to put the graphics board on a high-speed local bus. Nearly every new high-end PC graphics controller has a glueless 32-bit local-bus interface to the VESA Local (VL) Bus or the Peripheral Component Interconnect (PCI) bus. The system interface on the controller supports a 44-Mbyte/sec local-bus channel for a 33-MHz bus clock (this peculiar ratio arises because each 32-bit word takes three clock cycles--that is, each clock cycle transfers 4/3 byte). The design example conforms to the PCI specification, although the VL Bus works equally well.


Backwards compatibility first

figure

To support older VGA and Super-VGA (SVGA) applications, VRAM-based PC graphics controllers have an on-chip SVGA unit tied to an 8-bit SVGA pixel port on the RAMDAC. The SVGA unit is independent of the main graphics engine, and the chip switches between the two graphics units under software control. Although the SVGA unit can support fairly high screen resolutions (up to 1280×1024 pixels in the case of the controller in Fig 1) and color depths up to 24 bits/pixel, the 8-bit port is, in general, ill suited for true-color modes.

The 8-bit port comes up short because loading each 16- or 24-bit pixel into the RAMDAC takes multiple clock cycles. At 800×600-pixel resolution (75-Hz refresh), the required pixel rate is about 50 MHz. Because each 24-bit pixel requires three bus cycles for loading through the 8-bit SVGA port, the controller would have to drive the SVGA pixel lines at 150 MHz. The SVGA-port toggle rate for 24-bit/pixel true color climbs to over 200 MHz at 1024×768-pixel resolution.

Consequently, the graphics controller almost always bypasses the VGA bottleneck in high-resolution, true-color modes, switching to the wide pixel path linking the RAMDAC directly to the VRAM frame buffer. Today, the typical frame-buffer-to-RAMDAC bus (pixel bus) in VRAM designs is 32 bits wide. A RAMDAC with a 32-bit pixel port, such as the ATT21C505 in this design, can handle a variety of pixel formats. The RAMDAC can accept an entire 24-bit pixel or multiplex two 16-bit pixels, four 8-bit pixels, eight 4-bit pixels, or 32 1-bit (black-and-white) pixels from the frame buffer in just one load cycle. The wide pixel port not only increases the effective display bandwidth but also reduces the VRAM's shift-clock rate required for multiplexed pixels.

As the repository for all data that appear on screen, the frame buffer has to have the I/O bandwidth of the controller and the RAMDAC combined. You could use either DRAM or VRAM for the frame buffer's memory. A DRAM frame buffer has to divide its I/O bandwidth between frame updates and display accesses--frame updates from the system's processor and display accesses to the monitor. VRAM, on the other hand, taps the memory's full internal bandwidth on two ports at once; VRAM streams pixel data out to the RAMDAC while the controller simultaneously writes new data into the frame buffer.

Despite the relatively high cost of VRAM chips, a VRAM design offers some distinct advantages over simply using an extremely wide (64- or 128-bit) DRAM architecture. A wide-DRAM architecture can use relatively slow devices because it passes its entire 64- or 128-bit output to its equally wide RAMDAC in one cycle, achieving a high effective pixel-write rate (writing to a wide-DRAM architecture takes place at modest rates). But, because in a VRAM-based design the high-speed pixel data doesn't have to pass through the controller, the VRAM controller's die can be smaller, dissipate less power, and be housed in a lower-pin-count (ie, smaller and cheaper) package.


Frame-buffer interleaving

figure

Although an ordinary VRAM frame buffer provides more than twice the I/O bandwidth of DRAM, interleaving the VRAM increases the frame buffer's bandwidth even more--without incurring the cost of higher speed VRAM. An interleaved frame buffer comprises two banks of chips, having all the even-numbered data words stored in one bank and all the odd-numbered words in the other (Fig 2).

Because graphics-data accesses are generally sequential, interleaving cuts the memory chips' apparent cycle time in half by allowing the controller to read or write a word in one bank while the next memory location is precharging.

figure

For this design example, its 2-Mbyte frame buffer comprises eight 60-nsec, 256-kbyte×8-bit VRAM chips (four in each bank). The two banks share common address and data lines but have separate write enables, output enables, column-address strobes, and shift-register control lines (Fig 3).


figure

Advanced controllers support frame-buffer interleaving. When writing to the VRAMs, the controller automatically addresses the two banks alternately, writing even-numbered data words to bank 0 and odd-numbered words to bank 1. Fig 4 illustrates the controller's write cycle. If the next cycle is not a sequential write to the opposite bank, the controller extends the write operation--which usually takes only one clock cycle--by adding a second "finish write" cycle.

Bandwidth requirements
Bits per
pixel
Number of
pixels
Required pixel-bus
bandwidth (Mbytes/sec)
24 800 × 600 150
16 1024 × 768 158
8 1600 × 1280 <200

figure

The controller also has to account for the interleaved frame buffer when prompting the VRAM's shift registers to feed data to the RAMDAC. The controller has to generate the two opposite-phase VRAM shift clocks (SC0 and SC1) and serial output-enable signals (SE[0] and SE[3]) needed to present the data to the RAMDAC's pixel port in the proper order. Fig 5 shows the timing of the signals that the controller generates to control the VRAM's shift registers.

Although interleaving doubles the potential pixel-bus bandwidth for a given memory speed--for this 50-MHz VRAM, from 200 Mbyte/sec to 400 Mbytes/sec--in practice, the frame buffer's size limits the data-transfer rate actually needed in this design. Table 1 shows the practical bandwidths needed to service a 2-Mbyte frame buffer.

Thus, the only immediate practical benefits of frame-buffer interleaving on the RAMDAC side of the VRAM are that interleaving relaxes VRAM serial-port timing and reduces pin count. In theory, the bandwidth derived from interleaving and a 32-bit pixel port open a migration path to a 4-Mbyte frame buffer. Replacing the 2-Mbit chips (256 kbyte×8 bit) used here with byte-wide 4-Mbit VRAMs (512 kbyte×8 bit), an otherwise identical hardware design would support 1024×768-pixel true color and 16-bit color at 1600×1280-pixel resolution. However, to further increase frame-buffer bandwidth, memory manufacturers favor wider organizations over deeper ones in 4-Mbit VRAM devices. So, unfortunately, 512-kbyte×8-bit VRAMs are not available.

Even without interleaving, the combination of a VRAM's frame buffer, a wide pixel bus, and a separate VGA pixel path makes the video clocking in a VRAM-based PC display board fundamentally more complicated than in a traditional VGA design. Specifically, the video channel uses one of three different clocking scenarios, depending on the color depth.

figure

The first clocking scenario is for VGA. In VGA modes, the controller first reads the pixel data from the frame buffer's random-access port. Then the controller's SVGA unit feeds that data to the RAMDAC's 8-bit VGA pixel port (V[7:0]). The controller operates from the main pixel clock (VCLK, Fig 6) and generates a signal (VIDOUTCLK) that serves as both the pixel clock and the load clock for the RAMDAC. The RAMDAC's LOAD pin controls its input latch. Typically, you tie VIDOUTCLK to both the LOAD and PCLK0 input pins on the RAMDAC, as Fig 6 shows. Note that in the standard nonmultiplexed (8-bit/pixel) VGA mode, the ATT21C505 RAMDAC derives its pixel clock from the LOAD input, ignoring PCLK0.

Video clocking is more complex in higher performance modes. In these modes, because the display data moves across the main 32-bit pixel bus, the VRAM's serial-port operation must synchronize with the RAMDAC. The controller has to maintain the proper phase relationship between the memory- control signals (SC and SE) and the load clock so that the RAMDAC's pixel port latches each pixel only when the data are valid (Fig 5).

Typically, 32-bit RAMDACs for VRAM-based designs have an on-chip clock divider. The ATT21C505 in this design automatically generates a shift clock corresponding to the color/multiplexing mode, which the device's configuration register selects. To accommodate all three clocking scenarios requisite in this type of graphics-board design, both the RAMDAC and the controller have dual pixel-clock inputs selected via the devices' respective configuration registers.

Graphic design: one bottleneck after another

The title "primary performance-limiting factor" in PC graphics just migrates from one bottleneck to another.

The problem used to be the system bus; now local buses provide plenty of host-interface bandwidth. A few years ago, controllers couldn't move pixels fast enough. Now high-speed intelligent controllers with wide internal and I/O buses are available. The 8-bit VGA pixel port limited the color depth that RAMDACs could display at high resolution; currently, pixel-port widths are moving to 32 and 64 bits--and beyond.

Today, the problem is memory. The disagreeable trade-off between frame-buffer size and cost stands in the way of higher performance PC graphics. For example, the next step up from this design requires increasing the frame-buffer size to 4 Mbytes--large enough for 24-bit/pixel true color at 1024×768-pixel resolution (1280×1024 with pixel packing). This article's controller and RAMDAC can support a 4-Mbyte design. But, unfortunately, a 4-Mbyte version of this particular design would require 512-kbyte×8-bit VRAMs.

The new VRAM chips actually in the offing--256-kbyte×16-bit VRAMs with their serial ports organized into a 64-bit pixel bus--differ significantly from the VRAMs in design. However, such VRAMs could use the controller from this design along with a 64-bit RAMDAC such as the ATT20C311 or ATT20C511.

The real barrier keeping higher resolution, high-performance, true-color graphics from the mainstream PC market is the price of VRAM. At about $45 per megabyte, the cost of 4 Mbytes of VRAM exceeds (more than two times) the cost of a controller and an ATT21C505 or ATT20C311/511 RAMDAC combined. For the time being, a high-performance graphics board capable of 24-bit graphics at 1024×768 pixels and above may be more expensive than the high-volume PC market can bear.

In the simplest case--the nonmultiplexed, 24- or 16-bit/pixel modes--the controller can drive the video clocking directly. In this design, however, the controller takes the main pixel clock from the clock synthesizer and generates the shift clocks (SC[0:1]), serial enables (SE[0] and SE[3]), and the load clock (VIDOUTCLK). The design doesn't need a dedicated RAMDAC pixel clock because the ATT21C505 uses the load clock as its pixel clock in nonmultiplexed modes.


Clocking multiple-pixel loads

Whenever you pack multiple pixels into each 32-bit RAMDAC load--ie, in the 2:1 multiplexed, 16-bit/pixel mode and the 4:1 multiplexed, 8-bit/pixel mode--the load rate is obviously only a fraction of the DACPIXCLK'S pixel-clock rate. In these cases, you need divided clocks to drive the VRAM's shift registers and the RAMDAC's LOAD pin. The RAMDAC also needs an undivided pixel clock for timing the video output (but the controller does not).

The solution to supplying all of the necessary clock signals and keeping the frame-buffer serial port synchronized to the RAMDAC is to make the RAMDAC master of the display-data clocking. In multiplexed modes, the RAMDAC takes the main pixel clock from the clock synthesizer and generates a master shift-clock signal (SCLK). The controller uses this master shift-clock signal to synchronize both the VRAM's shift clocks and its serial enables to the RAMDAC's timing.

The frequency of SCLK depends on the color mode. In the 2:1 multiplexed, 16-bit/pixel mode, only one data load occurs for every two pixel clocks; therefore, SCLK runs at one-half the pixel-clock frequency. At 8 bits/pixel (4:1 multiplexed), SCLK runs at one-quarter the pixel-clock frequency.

The hardware has to generate the proper sync and blank timing signals for each mode's monitor. The controller uses the divided pixel clock to generate the sync and blank video-timing signals and also to control reloading the VRAM's shift registers.

In this design, the controller's built-in clock divider generates an internal index clock CRTC_CLK (not shown in Fig 6) from the RAMDAC's SCLK signal. The host system must program the controller's video-control registers with the appropriate sync and blank timing as a function of CRTC_CLK. The driver software must account for the fact that each CRTC_CLK represents multiple pixels in multiplexed color modes. However, you do not need to adjust for pixel multiplexing when configuring the controller's shift-register reload parameters. As long as the controller's internal clock is not divided down from the input (SCLK), each clock cycle corresponds to a full 32-bit data word in any of the multiplex modes.

Multiple-pixel transfers make supporting resolutions greater than 1280×1024 possible, even though the video rates required are well above 100 MHz. Although multiplexing reduces the toggle rates on the pixel lines and most clock lines, the RAMDAC still has to clock the pixels internally at the full video rate: 170 MHz for 1600×1280 pixels (60-Hz refresh) and 135 MHz for 1280×1024 pixels (at 72 Hz). Transmission-line effects make driving a 110-MHz, TTL-pixel clock from the clock synthesizer across a copper trace to the RAMDAC difficult.

Even if you were to design a transmission line to carry the full-speed pixel clock, the extreme edge rates (2000 to 4000 V/µsec) would generate considerable EMI. Consequently, the RAMDAC must double a lower frequency input signal to generate the video pixel clock internally. Suppliers of high-performance RAMDACs integrate a clock doubler onto their RAMDAC chips. The clock doubler can generate the 1280×1024-pixel video rate of 135 MHz, for example, from a 67.5-MHz input clock.

High-performance RAMDACs derive the proper SCLK and LOAD clock frequencies from their internal pixel-clock rate, not the clock rate on their input pin; that is, for 1600×1280-pixel resolution (170-MHz operation) at 8 bits/pixel (4:1 multiplexing), PCLK1 would be 85 MHz, and LOAD and SCLK would run at 42.5 MHz--or one-fourth the internal pixel-clock rate.

Some RAMDACs double the clock using the falling edge of the input signal. This scheme works fine as long as the input signal exhibits a stable 50% duty cycle. But if the duty cycle deviates from 50%, the deviation shows up on the CRT as alternating columns of wider and narrower pixels. The RAMDAC here has a PLL clock doubler, which obviates an accurate 50% duty cycle.

Even though a PLL clock doubler ensures that all the pixels are always the same width, you must take certain precautions when switching into and out of the clock- doubled modes.

The host computer can easily accomplish this sequence of operations during one vertical-blank period--much less time than needed to resynchronize the monitor.

Although the RAMDAC clock doubler helps moderate EMI problems that occur with high-performance graphics design, take additional steps to ensure a clean video signal and FCC EMI compliance. Noise control begins with the pc board. A four-layer board with separate power and ground planes yields quieter signals and supplies (as well as less spectral content in emitted frequency bands) than a one- or two-layer board. Route signals on the board's outside layers.

figure

Separate the power plane into digital and analog areas, connected by a ferrite bead (Fig 7). The ferrite bead filters out high-frequency currents and should exhibit resistance starting at a frequency higher than the maximum signal frequency on the board but lower than the second harmonic of that frequency. A Fair-Rite 2743001111, a Ferroxcube VK20019-4B, or a Philips 431202036690 ferrite, each of which provides a resistance of approximately 75 Ohm at 100 MHz, are appropriate for this design.

figure

In lower speed designs with pixel clocks of 110 MHz or less, two 0.1-µF capacitors (one for every three pins) decouple the RAMDAC's VCC pins. For this high-performance graphics board, however, use one 0.1-µF capacitor for every two pins (a total of three) and place an additional 0.01-µF capacitor in parallel with each 0.1-µF component to shunt the high-frequency harmonics to ground (Fig 8). A 10-µF capacitor filters out the lower frequencies. (Use chip capacitors because of their significantly lower lead inductance.)

A comparably robust network of capacitors must decouple the RAMDAC's VCC pins. To ensure clean clock signals free of high-frequency noise components, decouple the clock synthesizer's VDD pins and the RAMDAC's PLL's supply pins with pi filters. Filter the RAMDAC's COMP pin with a series 15 Ohm resistor to smooth out noise (the COMP pin allows compensating for the internal parasitic capacitance of the RAMDAC's FETs with an external 0.1-µF capacitor).

figure

The edge rates on the pixel bus, clock lines, and sync and blank lines can play havoc with any signals routed nearby. Keep these lines away from the RGB analog outputs--and each other. All high-speed lines, both analog and digital, should be as short as possible. To this end, locate the RAMDAC adjacent to the video connector (between the video connector and the host-interface connector) to minimize circuitry between the RAMDAC and the board's power-supply pins.



You can place the RAMDAC over the analog power plane somewhere close to the digital/analog separation; or, place it astride the digital/analog separation so that the pixel inputs are over the digital-supply plane. Placing the RAMDAC over the digital/analog separation reduces coupling into the analog plane.

The main consideration in routing the clock signals is preventing noise from coupling onto the pixel lines. Keep clock-signal traces as short as possible and do not run them parallel to the pixel lines or any other high-speed signals. Putting the clock and pixel lines on different planes is best, but they can be on the same plane if you shield them with a ground trace on each side. Route no high-speed signal under the RAMDAC itself.

Because this design has so many digital signals, high-slew-rate edges may feed noise to the DAC outputs despite all of the precautions listed previously. The only solution is to smooth the fastest edges using series resistors. Slow all address, data, and control lines from the controller (including the VGA-pixel lines) as well as all clock lines with 33 Ohm resistors located very close to the controller or clock generator (Fig 8).


Approaching an ideal transmission line

Ideally, the analog RGB video signals would run down a perfectly matched 75 Ohm coax transmission line the entire way from the DAC's outputs to the monitor. However, you can avoid the expense and inconvenience of this ideal solution. Load resistors match the outputs to the impedance of the termination (a 75 Ohm monitor). Place these load resistors as close as possible to the DAC's output pins. You can add series ferrite beads (52 Ohm) to the analog video signal to filter out any high-frequency signals coupled onto the DAC's outputs or reflected from the monitor. In addition, a separate video ground-return trace on the ground layer of the pc board, running directly to the ground of the host connector, prevents the analog video return current from interacting with components on the board.


Acknowledgment


Thanks to Robert Embry, who has seven years technical sales experience with Weitek, for reviewing this article and contributing technical information.


Rhett Saugier is graphics applications manager for AT&T Microelectronics' Application Specific Standard Products Division in Allentown, PA. He holds a BSEE from San Diego State University and has 10 years' experience as a marketing and applications engineer for various analog-IC vendors, including Texas Instruments and Brooktree. Saugier has been with AT&T Microelectronics since 1990.


| EDN Access | feedback | subscribe to EDN! |
| design features | design ideas |


Copyright © 1995 EDN Magazine. EDN is a registered trademark of Reed Properties Inc, used under license.