EDN Access

 

October 9, 1997


Advanced DRAM puts you in the fast lane

Brian Dipert, Technical Editor

Direct RDRAM, which targets 1999 and later PCs, is now public. Is the race for PC main memory over, or are double-data-rate SDRAM and SLDRAM still in the running? And is there room for other DRAM architectures in the diverse embedded-application base?

Beginning most noticeably with the emergence of the Intel486 CPU, a growing disparity has developed between the performance of DRAM and the bandwidth needs of PCs and embedded systems. On-CPU clock multiplication widens the gap between memory capability and system needs even further than external clock frequencies would indicate. But an array of new DRAM architectures offers you a number of options for relieving the memory-performance bottleneck.

One common system-level technique to overcome DRAM's limitations is to increase the size and complexity of levels 1 and 2 (L1 and L2) SRAM caches. Beyond the diminishing returns and higher cost of this approach, the processor's cache does nothing to improve the performance of other subsystems that access DRAM. In today's PCs, for example, bus traffic also flows across the PCI bus between DRAM and SCSI, IDE, ISA, USB, and, soon, Firewire-based peripherals. These data streams include network traffic and video and high-fidelity audio inputs from digital-video disk (DVD) and hard-disk drives.

21cs1Intel's (Santa Clara, CA) accelerated-graphics-port (AGP) architecture, which is part of the PC 98 specification, essentially eliminates CPU-to-graphics-controller traffic over the PCI bus but may increase data flow between the frame buffer and main memory (Figure 1). This data includes 3-D texture maps cached in main memory and used by the graphics controller to define pixel values. According to Intel's estimates, these texture maps can be as large as 32 Mbytes. AGP may also mark the re-emergence of the unified memory architecture (UMA). For users who rarely run 3-D-graphics applications, for example, manufacturers may ship PCs with smaller, 2-D-optimized frame buffers. These systems will dynamically allocate additional frame-buffer space out of the larger main memory on an as-needed basis to store alpha (opaqueness) and Z (depth) 3-D information.

Because caching around the DRAM is an insufficient response to increasing memory-subsystem bandwidth demands, how can DRAM manufacturers improve their devices' inherent performance? (See box "The fill frequency.") Combinations of the following techniques, each with its trade-offs, commonly find use in squeezing maximum bandwidth from the DRAM subsystem:

  • Widen the data bus within the DRAM, between the DRAM and the memory controller, or both;
  • Increase the memory-bus clock frequency, and pass more information through each data pin on each clock (on both the rising and the falling edges, for example);
  • Convert the sense-amp array into a true on-DRAM SRAM cache, and add simple cache-controller and other logic on the memory to decouple internal SRAM and DRAM operations;
  • Subdivide each DRAM component's array into multiple smaller blocks, divide the components into multiple banks, or both. A well-designed memory controller can then "hide" sense-amp page-miss cycles, block or bank precharge, and refresh by concurrently accessing data in other locations within the array. Smaller banks also tend to enable faster accesses because of reduced internal bit-line loading;
  • Reduce the bus overhead needed to control the DRAM and to specify which locations to access;
  • Provide a separate control bus, thereby eliminating impacts to data-bus bandwidth;
  • Access multiple internal locations in parallel, and multiplex among them at the device data-I/O pins; and
  • Shrink the differential between I/O high- and low-voltage levels to minimize transition time, and carefully match impedances of the drivers, the loads, and the transmission lines between them. In addition to improving performance, low-voltage-swing outputs reduce dynamic-power consumption and decrease the magnitude of EMI effects. One potential down side to lower swing threshold levels, however, is reduced noise immunity.

Intel predicts that main-memory subsystems will need to provide as much as 1.6 Gbytes/sec of peak bandwidth by early 1999, when a range of higher performance PCs will appear. Similarly, the first-generation AGP specification, which uses a 32-bit, 66-MHz bus, calls for 266-Mbyte/sec maximum bandwidth between the chip set and graphics controller. You can assume that AGP-equipped PCs will have similarly stringent bandwidth needs between the graphics controller and frame buffer. AGP's 2X mode transfers data on both rising and falling edges of the 66-MHz clock, translating to 533-Mbyte/sec bandwidth. The future 4X mode will achieve 1.1 Gbytes/sec, probably by using higher clock frequencies. In comparison, 66-MHz synchronous DRAM (SDRAM) on a 64-bit system bus delivers only 533-Mbyte/sec maximum performance. Sustained bandwidth is lower because of address, control, and non-page-access-cycle overhead.

Workstations and embedded systems--such as CD-ROM and disk drives, digital set-top boxes, and imaging and communications equipment--will soon have similar bandwidth needs if they don't already. Embedded-system designers are also interested in cost-effective alternatives to the fast SRAM that they traditionally use. In evaluating advanced DRAM architectures to use in your next-generation designs, consider both technical attributes of the memories and related economic variables. These factors include the potential for the memory's use in high-volume applications and the number of vendors shipping or planning to include the memory in their product lines.

The DRAM market, valued at approximately $25 billion according to Walt Lahti, principal analyst at InStat (Scottsdale, AZ), is one of the key technology-development engines in the roughly $120 billion semiconductor-IC industry. With PCs (and, therefore, Intel) directly influencing approximately two-thirds of the overall DRAM market, you should not ignore PC trends. Also, you should consider the presence and diversity of chip-set and integrated-processor memory-controller support, as well as whether realistic product-sampling and production schedules exist at necessary densities, bus widths, speeds, voltages, and electrical interfaces to meet your requirements. Finally, carefully examine the assumptions that various vendors' benchmark claims make for suitability to your application.

Main-memory options

Main-memory accesses predominantly consist of frequent but short code-read bursts for cache-line fills, although growing data traffic is beginning to alter this generalization. Today's systems have high cache-hit percentages. Also, accesses scattered throughout the available memory space result from software that was developed with modular languages, such as C++, and that runs on multitasking OSs. These factors create a strong possibility that each DRAM access won't be to an already-open row page stored in the DRAM's sense-amp array. These factors also increase the probability that the mP won't subsequently need some of the locations it accesses during a cache-line fill.

Therefore, optimizing random-access latency is important for main memory, at least as important as the time necessary for subsequent accesses within an open page. Many high-end PCs still use extended-data-out (EDO) DRAM because, at 66 MHz, SDRAM can require an extra wait state for initial accesses, although subsequent fetches within a burst take one fewer wait state. The resultant clock profiles for a four-access burst cache-line fill using 60-nsec fast-page mode (FPM) DRAM, EDO DRAM, and 66-MHz SDRAM show limited bandwidth differences (Table 1). The table also shows L2 cache for comparison. Estimates for the resultant system-performance gain average only approximately 2%, with 5% at the high end.

The SDRAM performance improvement over EDO should be more significant at 100 MHz, and the zero-wait-state burst accesses also will prove valuable for accelerating long multimedia-data transfers. Pipelined and speculative prefetch µPs, such as Intel's Pentium II CPU, allow the memory controller to exploit SDRAM's multibank architecture for maximum efficiency. These combined trends should effectively conclude the era of EDO, at least for future generations of PCs.

In spite of SDRAM's limited performance improvement at 66 MHz, some PC manufacturers use it both as a marketing differentiator, because DRAM companies now provide it to them at little to no price premium, and as a way to gain experience for the upcoming 100-MHz SDRAM generation. According to several DRAM vendors, price parity between EDO and SDRAM has been delayed not necessarily by extra silicon cost but by yield. A 66-MHz SDRAM is roughly timing-equivalent to a 50-nsec EDO, and no demand exists for a product that doesn't yield to 66 MHz. Hitting these speed targets with near-100% yield and cost-effective die size usually requires at least a 0.4-µm process lithography.

Unclear system-performance benefits are the least of SDRAM's challenges. Manufacturer-to-manufacturer variations in power-up and initialization sequences, I/O impedance, output-buffer strength, column-address-strobe (CAS#) latencies, and clock-to-data and other timings wreak havoc with system engineers attempting to purchase product and with end users attempting to upgrade their systems. The situation is better than it was in the early days. (SDRAM definition began in the late 1980s!) Back then, some SDRAMs used a single-bank internal architecture, whereas others used a two-bank approach. Early SDRAMs also offered level- and edge-triggered row-address-strobe (RAS#) alternatives and had variations in supported burst lengths.

Incompatibilities are still so bad, however, that Intel has issued a document, commonly called "PC-66," with commands, modes, and specs more stringent than those in the JEDEC documentation. Intel requires DRAM manufacturers to supply devices that meet PC-66 to ensure compatibility with Intel's chip-set and motherboard designs. Intel has also developed a similar document for upcoming 100-MHz SDRAMs, which promise to be even more challenging. Remember that at 100 MHz, the total clock period will be 10 nsec with additional deductions for rise and fall times, chip-to-chip and signal-to-signal skew, and signal-propagation delay. Some of the more critical aspects of the PC-66 and PC-100 SDRAM specifications include setup, hold, and clock-to-output timings (Table 2).

21cs2At 66 MHz, the SDRAM in each slot of a standard two-bank system configuration has less than 15 nsec to respond to write and read requests (Figure 2). Taking reads (whose timings are usually more stringent than those of writes) as an example, the clock generated by the core logic propagates to the memories, causing output-data transitions that must then travel back and meet the memory controller's setup specifications before the next clock rising edge. This process, called "flight time," is complicated by the unequal input loading of control, address, and data signals, as well as by trace-length differences caused by board layout and module and component pinouts, all of which contribute to signal skew. I/O impedance and output-buffer drive variations among DRAM manufacturers are especially a problem for the data bus because of its tight clock-to-data timing, but these variations can cause push-outs on any signals.

Some memory modules include delay-locked loops (DLLs) or PLLs to factor out clock-propagation delays between the controller and the memory-module socket. These circuits ease read-timing requirements but increase cost and power consumption and complicate power management. Register-buffered modules, an option most attractive in memory-intensive servers and workstations, present a consistent load to the core logic, regardless of the number and density of memory chips. However, the registered data-bus delays, complicated by bidirectional data-flow requirements, further increase the random-access latency.

Bill Johnson, marketing director at Smart Modular Technologies, believes that PC OEMs hoping to avoid incompatibility problems are increasingly standardizing on a few memory and module suppliers and requiring customers to purchase memory upgrades directly from OEMs. Modules now contain serial-presence-detection EEPROMs (JEDEC standard 21-C) that provide more information than the previous parallel-presence-detection pins. Johnson predicts that some PC manufacturers, by means of the BIOS, could use this information to alert their support organizations to the presence of invalid memory modules. For end users, having to purchase memory upgrades directly from the manufacturer would reduce the number of available supply options and could result in higher prices.

One other issue confronting SDRAM and other more traditional main-memory options is granularity, which is the minimum base-system memory density and the minimum density-upgrade increment. As this article went to press, a number of Korean and Japanese DRAM vendors were intentionally shifting focus from 16-Mbit-DRAM production to the upcoming 64-Mbit generation. The main reasons for doing so include gaining more experience with new sub-0.3-µm manufacturing lines, disengaging from the intensely price-competitive 16-Mbit market, and attempting to use supply to move customer demand to the less crowded and more financially attractive 64-Mbit density. However, even with a ×16 DRAM component-bus width, the minimum system granularity using 64-Mbit DRAM is 32 Mbytes for one ×64 system bank or 64 Mbytes for the more common dual-bank configuration (Table 3). Four-bank DRAM subsystems, such as those anticipated for the PC-100 SDRAM generation, will have even higher granularity.

Even in an era in which OSs and applications consume tens of mega-bytes, this level of system granularity may be unacceptable, especially for entry-level PCs. This fact is one of several reasons that some DRAM manufacturers plan to offer an incremental 128-Mbit DRAM generation between the traditional 64- and 256-Mbit densities. Other incentives include the ability to preserve much of the manufacturing equipment and methodologies that vendors use with 64 Mbits, as well as the ability to fit the die into existing packages. Granularity is a critical issue for embedded systems, whose density requirements often lag several generations behind those of PCs. One option for embedded-system designers is migration to other memory technologies, such as SRAM and flash memory. Another possibility is to integrate DRAM on ASICs. This option is also attractive to ASIC manufacturers as they strive to create demand for the gate counts that their leading-edge manufacturing can deliver.

Granularity, as it relates to chip- and system-bus width, can also significantly affect power consumption. Consider a 32-Mbyte, 64-bit data-bus system requirement. Using 64-Mbit SDRAMs that are each ×16, you need four chips, all of which consume active power each time the system accesses the DRAM bus. With an alternative ×16 Direct RDRAM or SLDRAM (formerly, SyncLink) system bus, you'll still need four chips, but only one will be active each DRAM-bus access. Even though this ×16 access will run at much higher clock frequencies to achieve equivalent bandwidth, the lower total capacitance due to the narrower bus will tend to balance out the dynamic output power (P=CV2f). Fewer active chips mean lower average internal power consumption per chip, and the end result may be lower total power consumption. These differences decrease in significance as system memory-density requirements grow.

DDR SDRAM

21cs3The 2X mode of Intel's AGP specification gains its added bandwidth by passing data on both rising and falling edges of the 66-MHz clock. Newer generations of synchronous cache SRAM, such as those on some workstation-level CPUs and in the L2-cache bus of the upcoming Intel Slot II processor module, use similar techniques; so does double-data-rate (DDR) SDRAM, or SDRAM-II (Figure 3). Samsung proposed DDR to JEDEC in December 1996, and several DDR standards are nearing approval. Many memory suppliers are considering supporting DDR beginning at the 64-Mbit density. Samsung, for example, plans 64-Mbit DDR production in mid-1998. Some companies, in fact, hope to design standard and DDR SDRAMs on the same silicon to minimize risk and improve demand flexibility. One example might be a ×16 DDR option on the same die as a ×32 standard SDRAM.

DDR supporters make impressive bandwidth claims at high data-transfer frequencies and wide data-bus widths (Table 4). Realistically, however, this potential will be limited by address- and control-cycle overhead and by the fact that address cycles will use only the rising edge of the memory clock, giving them half the bandwidth of data cycles. For this reason, most DRAM manufacturers planning to offer DDR will bypass the 66/133-MHz option and go directly to the 100/200-MHz version. The 66/133-MHz DDR would arrive after Intel's planned conversion to a 100-MHz local-bus frequency and would also offer uncertain performance advantages over standard 100-MHz SDRAM.

These high frequencies may require differential clocks, dedicated data strobes, and migration from low-voltage TTL (LVTTL) to low-swing stub-series terminated logic (SSTL), at least on data and clock signals. Planned conversions from a two- to a four-internal-bank configuration, in combination with memory-controller designs that exploit this configuration's performance capability, will help hide bank random-access, precharge, and refresh latency. DDR will use an internal data bus twice as wide as the external bus, accessing two locations at the same time to achieve performance targets. This wider internal bus may increase die size and cost relative to a standard SDRAM; however, Direct RDRAM and SLDRAM will be even wider.

Some DRAM manufacturers have limited confidence at this early stage that customers designing high-volume upgradable systems will be able to reliably use the 133/266- or 150/300-MHz speeds, even with SSTL-3 or -2 interface levels. This lack of confidence leads to worry that 100/200-MHz DDR will be a "point" product, which is a big concern for memory, chip-set, and system manufacturers making difficult time, money, and resource-allocation decisions. Minimal-chip designs with point-to-point connections between the DRAM and controller and with little to no upgrade requirement, such as graphics frame-buffer memory, are more straightforward applications for the higher DDR speeds. S3 Corp (Santa Clara, CA) is one of several graphics vendors working on DDR JEDEC standardization.

Because DDR is conceptually so similar to standard SDRAM, it uses much of SDRAM's test, assembly, and board- and module-manufacturing infrastructure. DDR is also an open architecture that promises to be a simple transition for both memory and chip-set manufacturers. However, Intel states that it plans no support of DDR in any upcoming chip sets. Even without Intel's involvement, which may change in the future, DDR may find sufficient interest to ensure at least some market success. Potential DDR adopters include other PC µP suppliers, such as AMD (Sunnyvale, CA); non-Intel chip-set manufacturers, such as Via Technology (Fremont, CA); and workstation and server companies.

Workstations are attractive because they are just beginning to migrate from EDO to SDRAM. Workstations and servers also tend to have longer development and product life cycles than do PCs, and they use more proprietary and complex DRAM-subsystem architectures to achieve performance and density requirements. From the burst-EDO and "SDRAM-lite" lessons, however, you might infer that DDR without Intel's blessing could be nothing more than a niche product. Fundamental DDR characteristics, such as clocking schemes, voltage levels, and pinouts, are still under vigorous debate in JEDEC, and multiple conflicting and incompatible "standards," if they appear, will only cause further confusion and slow adoption.

A few other DDR concerns are worth mentioning. The conversion from LVTTL to SSTL interface levels might cause end-user frustration during memory upgrades. Also, DDR will contain internal PLLs to remove memory-clock skew. This fact could complicate system power-management functions, because PLLs tend to have slow, unpredictable response to power transitions and input-clock suspension and resumption. In addition, DDR may use a unidirectional (read-only) or bidirectional data strobe, or "echo clock," traveling with each 8- or 16-bit data word to synchronize cycles between the memory and controller (Figure 3). These data-strobe signals (DS), along with the differential clock and 2.5V SSTL-2 supply voltage under consideration for 133/266- and 150/300-MHz speeds, take up pins reserved for error-correction-code (ECC) functions on today's DIMMs and create the potential for further module incompatibility. The ×32 or larger external data bus needed for reasonable system granularity at 64-Mbit densities and above will increase die size and cost. With multiple outputs transitioning simultaneously, the data bus may also cause so much noise and burn so much power that it would overwhelm any potential performance improvement.

Enhanced Memory Systems' Enhanced DRAM (EDRAM) architecture includes a number of features designed to maximize sustainable memory bandwidth in systems with small or nonexistent caches. Today's parts combine a 25-nsec, random-access, 4-Mbit DRAM array with a 10-nsec, 2- or 8-kbit SRAM cache on one chip. The wide, 2-kbit internal interface between DRAM and SRAM arrays also helps boost performance over standard DRAMs. Interface options for the original EDRAM included static-column and FPM. Enhanced Memory Systems slightly modified standard pinouts and functions to maximize the effectiveness of the integrated cache. Chip select (S#) enables the memory controller to access the SRAM while precharging the DRAM. Write/read (W/R#) control avoids closing the SRAM page when writing to the DRAM array, and internal logic updates both DRAM and SRAM, if necessary, to maintain coherency. Finally, a dedicated refresh pin (F#) allows hidden refresh without a CAS#-before-RAS# cycle, which would close the open SRAM page.

Recently, EDRAM evolved to a multibank architecture with EDO and burst-EDO options. Along with the performance improvement that the new interfaces provide, multibank EDRAM subdivides the DRAM and SRAM into four internal banks. This evolution continues the trend toward hiding precharge and refresh and retains the pinout enhancements in the original EDRAM. Multiple banks further improve sustainable bandwidth by increasing the probability that randomly desired code or data will already be in the partitioned SRAM cache. Reference designs, documentation, and timing-analysis tools, available on Enhanced Memory Systems' Web site, minimize the effort needed to interface EDRAM to a variety of PC and embedded processors.

The company plans to offer its first SDRAM-interface (Enhanced SDRAM, or ESDRAM) 16-Mbit devices for sampling by year-end. The devices will be as wide as ×16 and as fast as 133 MHz and contain two DRAM and two 4-kbit cache banks. ESDRAMs will be fully pinout-compatible with SDRAMs. They will still support the ability to hide DRAM refresh concurrently with cache accesses by means of software-command enhancements beyond the standard SDRAM suite. The company also plans 64-Mbit and DDR variations of ESDRAM for 1998. VLSI Technology's (San Jose, CA) Polaris core-logic chip set for Alpha µPs will support ESDRAM.

EDRAM's success in graphics and embedded-system applications and as a fast-SRAM alternative originates in its ability to provide higher performance than standard DRAM while preserving some compatibility with standard DRAM interfaces. As an evolutionary alternative to DDR for SDRAM designs, ESDRAM will be subject to the same power, timing, granularity, and other concerns. However, although the lack of extensive supply sources will tend to keep ESDRAM prices higher than more mainstream alternatives, it may also prove to be a hidden blessing. Fewer supply sources mean less potential for functional and specification variability, and the smaller system-memory densities common in embedded-system designs also tend to increase design margin. ESDRAM also has good potential to meet Intel's PC-100 SDRAM timing and functional requirements, although the fact that ESDRAM is entering the 16-Mbit density as many other DRAM vendors make the transition to 64 Mbits may impact its PC success.

Direct RDRAM

21CS4The Rambus DRAM (RDRAM) evolved from research in the late 1980s on how to cost-effectively maximize performance of generic DRAM. First-generation Base RDRAMs, beginning at 4-Mbit densities and then moving to 16- or 18-Mbit densities (NEC also offers an 8-Mbit part), used multiple banks; a wide, 64- or 72-bit internal bus; and a 64-to-8- or 72-to-9-bit output multiplexer to deliver 500- to 600-Mbyte/sec bandwidth at bursts as long as 256 bytes. RDRAMs also use current-driven open-drain outputs with matched transmission-line termination resistors to transfer data on both edges of the 250- to 300-MHz clock. The clock travels from chip to chip in a loop with an oscillator on one end, a termination pullup resistor on the other, and the memory controller in the middle, and each chip resynchronizes the clock with an onboard PLL. This technique treats the clock in the same way as it treats a data signal, so clock and data always travel in the same direction at the same speed. The clock-loop center is on the controller, giving one timing point to synchronize operations to the primary clock source, the µP clock. Low-swing Rambus-signaling levels (RSLs) are 1.5 and 2.5V, with the reference voltage at 2V and the termination voltage at 2.5V (Figure 4).

RDRAMs interface with the Rambus memory controller in a packet-based fashion. Base and Concurrent RDRAMs transfer address, data, and control information across the Rambus channel via a common set of eight or nine pins that are synchronized with the clock by careful impedance and trace-length matching. After the memory controller initiates a read or write request, Base RDRAMs respond with an acknowledge (ACK) packet if the desired addresses are in the sense-amp cache and the transaction can proceed. Base RDRAMs respond with a no-acknowledge (NACK) packet if the locations must be loaded to the sense amps (which the memory automatically does), indicating that the memory controller should retry the transaction later. During this delay, the memory controller can initiate a request to another RDRAM, if possible, to better exploit available bandwidth. The Rambus-controller design includes a demultiplexer to convert the high-speed, 8- or 9-bit channel back to its 64- or 72-bit lower frequency alternative within the system controller.

Reliable RDRAM operation requires careful pc-board layout, especially in multichip designs, to keep the total channel length as short as possible and to eliminate differences in trace length and impedance between signals. Achieving this goal ensures that signal-to-signal skew is as small as possible. The maximum number of chips per Rambus channel is 32. This restriction is one of the key Rambus-architecture concerns for high-end servers and workstations, although you can overcome it by adding channels to the memory controller, which increases pin count, or by providing a channel-to-channel bridge chip, which impacts initial access latency. The close chip-to-chip placement, combined with frequency-driven high dynamic-power consumption, also creates thermal-dissipation challenges that did not exist with SIMMs and DIMMs. However, by enabling signal-propagation delays between the memory and controller to span multiple clock periods, the Direct RDRAM specification allows the channel (and therefore spacing between chips) to be substantially longer than that of previous-generation RDRAMs.

The second-generation Concurrent RDRAM protocol makes better use of the bus bandwidth than does Base RDRAM, and the protocol forms the basis of today's third-generation Direct RDRAM. The Rambus controller can now initiate as many pipelined transactions within one RDRAM as there are banks, creating the potential for unlimited zero-wait-state bursts. The Concurrent protocol also eliminates the complex multitransaction ACK/NACK handshake, requires fewer internal registers and counters, and more effectively uses the available serial-communications channel.

21CS5AGP adds sideband control signals to maximize data throughput on the main chip-set-to-graphics controller bus (Figure 1b). The Direct RDRAM subsystem uses similar techniques, giving address and control signals their own separate 8-bit channels (Figure 5). The data channel widens from 8 or 9 bits to 16 or 18 bits, and the clock frequency increases to 400 MHz for 800-Mbps maximum data throughput on each pin. The result is the 1.6-Gbyte/sec bandwidth that Intel's 1999 high-end-system predictions require. According to Abid Ahmad, Intel program manager for graphics enabling, Intel's simulations predict that the Direct RDRAM enhancements will result in 95 to 98% use of data-bus bandwidth potential. Additional Direct RDRAM enhancements include a differentially driven clock, a redesigned PLL with more power-optimized modes, self-refresh in power-down, and lower swing I/O levels centered on 1.4V for compatibility with advanced controller-logic process lithographies (Figure 4).

Direct RDRAMs incorporate an internal 128-bit bus, and, at the 64-Mbit density, they will include four internal DRAM banks. The Direct RDRAM specification allows for a range of sense-amp and DRAM array sizes as well as any number of banks. You can determine the ideal number of banks for a chip density and application by examining the anticipated row- and column-access delays within the chip, Rambus-channel clock-frequency target, and maximum commonly occurring burst length. Too few banks will "starve" the channel during long access bursts, but too many banks will make the memory unnecessarily expensive.

A 16-bit system interface may first seem a counterintuitive means of in-creasing performance, considering the ever-widening CPU data-bus trends. However, this approach delivers several advantages over a traditional 64-bit or wider alternative, starting with the previously described power, noise, EMI, and system-memory granularity. Using fewer signal, supply voltage, and ground pins reduces package and die costs for both the controller and the memory, simplifies pc-board layout, and minimizes real estate. Concurrent RDRAM has 31 such pins, and Direct RDRAM has approximately 76, compared with approximately 140 for SDRAM and approximately 160 for SDRAM-II (see box "Narrow-bus, high-bandwidth memories").

A low incremental pin count per additional channel, combined with predictable clock-frequency improvements as process lithographies continue to shrink, gives Direct RDRAM performance head room. These factors, along with the technology's lower risk by using concepts already proven in Concurrent RDRAM silicon, were key influences on Intel's decision to support Direct RDRAM, according to Ahmad.

He says, "Intel's approach is that solutions are realistic when they exist." Subodh Topraini, vice president of marketing for Rambus, reports that the company has completed its first Direct RDRAM design, and simulations show that the I/O buffers have performance capability greater than 1 Gbps.

Many Direct RDRAM concerns come from DRAM and chip-set manufacturers and deal with economic and political issues, not necessarily technical shortcomings. Rambus is a memory-design company, not a memory manufacturer, and it makes its money by collecting royalties from the DRAM and chip-set vendors that license and produce its products. Memory companies worry about loss of innovation, differentiation, and control in setting future standards, as well as sacrificing the perceived benefit of multiple vendors' perspectives in tackling system challenges. Some memory companies also predict low initial yields to the 400-MHz clock specification, as well as higher memory and core-logic (and, therefore, system) prices due to the royalty payments and the possibility of Intel's re-entry into the DRAM market. Logic companies also see the potential for Intel to translate its Rambus alliance into further success in the PC chip-set arena. These logic companies even extrapolate to the prediction that Intel will integrate a Direct RDRAM controller directly on a future µP. NEC, with its V830R/AV, has already integrated a Concurrent RDRAM interface on a CPU. Regardless, more than 20 memory and core-logic companies, including the top 10 DRAM manufacturers, have taken Rambus licenses.

Some non-PC-system companies also worry that Rambus and Intel will not consider their needs when making Direct RDRAM architecture decisions for the PC. For example, high-reliability servers and communications equipment can use a ×72 or ×80 system-memory bus to spread data and ECC bits among multiple chips, enabling a well-behaved system shutdown if an entire memory chip malfunctions (a hard error). Because the Rambus channel accesses all information from one memory, a hard error in that memory chip could cause a system crash. Individual RDRAMs do not have mean-time-between-failure rates inherently inferior to those of standard DRAMs, however. Failure rates depend on how often the DRAM is accessed, so efficient processor caching helps. After several years of moving away from ECC for cost reasons, PCs with SDRAM are again using it, primarily in response to large per-chip densities and higher clock frequencies. Both of these factors increase the probability of an occasional soft bit-level error due to alpha particles or noise.

SLDRAM

SLDRAM-architecture definition efforts, which had been slowly progressing for several years, accelerated early this year in response to the Intel/Rambus announcement at the International Solid State Circuits Conference in February. SLDRAM developed from two previous IEEE high-speed bus standards: the 1595 Scalable Coherent Interface (SCI) and the 1596.4 RamLink, an SCI subset that removed multiprocessor and other features that the IEEE committee judged unnecessary for the target applications. SLDRAM further modified the point-to-point RamLink interface by optimizing for multichip DRAM arrays, a maximum 64-byte burst length for high-end CPU cache-line fills, and a 3-to-1 average read/write-access ratio.

21CS6Because Direct RDRAM and SLDRAM are similar, at least at a high level, many of Direct RDRAM's strengths and shortcomings apply equally to SLDRAM (Figure 5 and Figure 6). One difference between the two competing approaches involves the output-buffer structure. RDRAM uses an open-drain output with pullup-resistor termination at the end of the Rambus channel. SLDRAM will use a push-pull, low-voltage-swing, full-CMOS output that is conceptually similar to SSTL. SLDRAM-bus termination will consist of pullup resistors plus series-stub resistors on each memory module.

Farhad Tabrizi, SLDRAM Consortium chairman and director of strategic marketing at Hyundai, claims that push-pull outputs use less power than open-drain outputs, especially at high frequencies. (You can reach the Consortium at www.sldram.com.)The actual power use depends somewhat on access profile, bus loading, and other assumptions. Push-pull outputs consume dynamic power with each output transition, whereas open-drain outputs draw extra output current through the chip only when pulling the outputs to a logic-low voltage. RDRAM vendors may need to use larger open-drain pulldown transistors to overcome the passive-only pullup termination, but SLDRAM will consume constant power across the series-stub resistors. Differences in termination impedance also determine which DRAM has higher average power burn.

Some SLDRAM supporters also feel that push-pull outputs in combination with series-stub resistors, by presenting a lower impedance signal load, may be more tolerant of transmission-line reflections, such as those on the longer traces of heavily loaded systems. As an additional concession to servers, workstations, and other high-chip-count applications, the SLDRAM memory controller can measure the round-trip signal delay and voltage-level differences between it and the various SLDRAMs. The controller can then tune the SLDRAMs' input threshold voltages, output-driver strengths, dc-offset voltages, and turn-on timing characteristics to minimize or eliminate skew due to trace-length and chip-to-chip variations in today's SDRAM.

These techniques, assisted by SLDRAM's push-pull output structure, are common in today's high-speed chip and board testers. The memory controller initially calibrates the SLDRAMs on system power-up but may periodically recalibrate during normal system operation to account for high-temperature electrical and timing fluctuations. Low-end memory controllers might simply access all memories at the slowest chip's speed, whereas more elaborate controllers can dynamically control different memory regions and allocate available resources among functions according to their performance needs. Regardless of the technique, the chosen level of complexity resides within the controller rather than the memories themselves, which keeps total cost as low as possible. SLDRAM output-buffer simulations indicate performance head room to beyond 1.2 GHz.

Clock-distribution schemes between SLDRAM and Direct RDRAM also differ. Both architectures move from a standard one-signal to a two-signal differential clock to cancel out common-mode noise and reduce reliance on threshold-voltage levels. But whereas Direct RDRAM retains the round-robin clock loop, SLDRAM uses a more traditional "tree" distribution scheme. To account for clock skew, SLDRAM also uses the data-strobe concept first seen in DDR, with two bidirectional data clocks. Tabrizi feels that this scheme has fewer clock-circuitry requirements in SLDRAM than in RDRAM, enabling the use of a simpler--or even eliminating the use of--digital PLL in SLDRAM, which might lower cost and improve power management. He also points to the less stringent pc-board-layout requirements of this data-strobe approach in conjunction with the controller-to-memory calibration technique. The SLDRAM scheme may also enable faster response and fewer data-bus "dead" cycles when switching access from one chip to another. Finally, the SLDRAM clock technique also bypasses the potential for Rambus patent-infringement problems.

Almost every DRAM manufacturer participates in the SLDRAM Consortium, but to varying degrees: Some contribute only money and a meeting representative, whereas others dedicate small engineering teams to the effort. Hyundai and Mitsubishi are creating an SLDRAM conceptual test chip due for completion late this year, consisting primarily of I/O drivers and current-, voltage-, and timing-adjustment circuits; IBM Microelectronics is developing a companion evaluation module and system-board design. In parallel, Mosaid Technologies and Micron are each working on 64-Mbit, full-chip designs, with Mosaid's scheduled for completion mid-1998 and initial silicon coming from a Siemens factory. However, the performance targets for this initial design specify only a 200-MHz clock, resulting in half the bandwidth per 16-bit channel that Intel's 1999 high-end predictions require. SLDRAM supporters are optimistic that faster chips will follow six months later. Because the Consortium is still resolving SLDRAM architecture details, the amount of memory-design experience that vendors can use, which strongly influences schedule confidence, is unclear. Beyond the fundamental memory-design challenges, SLDRAM Consortium members must also develop functionally and electrically compatible memory controllers across a range of logic processes.

What about data?

Data-focused DRAM's requirements fundamentally differ from code's in a few key areas. Data transfers are often longer than cache-line fills, and the locality of a data reference (the probability that if one data access is in the DRAM's sense-amp cache, the next one will be too) is improved. These characteristics mean that fast sequential access within a long series is important. Additionally, data-access profiles tend to be more balanced in their read and write percentages. Read-modify-write functions are more common for uses such as pixel updates and block-level ECC (which data applications often allow), making fast context switches between reads and writes crucial. However, you cannot ignore random-access latency. In a graphics application, for example, color, depth, and opaqueness information for a pixel may physically reside in different regions of the frame buffer, and simultaneous rendering and drawing operations may access different areas of the memory.

From an economic standpoint, the unit volume that data DRAM represents is significantly smaller than that for code (Table 5). For example, compare the required graphics frame-buffer sizes at various resolutions, color depths, and 2/3-D graphics parameter sets with the amount of main memory shipped in an average PC. Table 5's data is valid if the graphics controller uses direct-mapped pixel values. Look-up-table-based pixel mapping proportionally reduces frame-buffer size but also limits the maximum number of simultaneously displayed colors. Beyond pixel data, graphics controllers sometimes store frequently accessed information--such as fonts, menus, cursors, and texture maps--in the frame buffer, which increases density requirements in the process.

Also note that the frame-buffer sizes in Table 5 often do not line up neatly with the granularity options in Table 3. This fact was one of the early motivations for UMA, which conceptually made more efficient use of available memory by combining code and graphics in one large subsystem. Some graphics-card and PC companies provide multiple frame buffers on their high-performance graphics products. These buffers require multiples of the memory densities that Table 5 shows for a parameter set but allow complete and simultaneous rendering of one or more frames while the RAMDAC outputs another frame to the monitor.

Although data-DRAM applications may not set the price and volume trends for the overall DRAM market, many have faster than average product development and obsolescence cycles. Many data-DRAM applications also push the state of the art in performance compared with code. (Think of 3-D graphics boards or communications data buffers for an example.) In converting the density numbers in Table 5 to bandwidth equivalents, remember that the common 72-Hz noninterlaced-display refresh rate includes the time needed to realign the scan beam. Frame-buffer peak bandwidth is as much as 50% higher than the display refresh rate would otherwise indicate. The refresh-rate-based bandwidth number also omits drawing bandwidth, a critical factor when you consider single-port memories.

Lower volumes, higher performance, and faster product development and obsolescence increase an engineer's willingness to consider alternative memory approaches. It's no surprise, then, that many more viable DRAM products exist in data-DRAM applications. Mosel Vitelic, for example, ships EDO DRAMs in 1-, 2-, and 4-Mbit densities and ×4, ×8 and ×16 interface options, with row-and column-access times as fast as 30 and 12 nsec, respectively. IBM Microelectronics, Integrated Silicon Solutions, Silicon Magic, and several other companies also offer fast- EDO DRAMs.

MoSys supplies the Multibank DRAM (MDRAM) architecture, which offers 32 internal DRAM banks, each with a corresponding 128-byte sense-amp array, per megabyte. MDRAM was one of the first architectures with such advanced features as a DDR data bus, low-voltage-swing outputs, and an embedded PLL. A large number of smaller blocks give density flexibility that's valuable in frame-buffer designs; MoSys offers parts in 0.5-, 0.75-, 1-, 1.25-, and 2-Mbyte densities. MDRAM operates as fast as 166 MHz with a dual-edge clock and has row- and column-access times of 32 and 14 nsec, respectively. MDRAM foundry sources include Integrated Device Technology (IDT), Oki, and Siemens. Unlike EDRAM, MDRAM's proprietary interface and protocol require custom memory-controller designs. MoSys recently announced synchronous-graphics RAMs (SGRAMs) based on the MDRAM architecture, as well as MCache L2 SRAM cache-replacement chips. IDT also plans to use fast DRAM to replace SRAM with its Fusion architecture in some applications. Finally, MoSys is developing an advanced MDRAM-based Rambus-interface memory.

SGRAM is a multisourced variation of SDRAM with added block-write and write-per-bit (masked-write) functions. Because of lower granularity requirements of frame-buffer memory, an 8-Mbit density with a ×32 interface is the most common SGRAM configuration. However, because the number of chips in graphics is less than the number of chips in main memory and the connection between the chips and the memory controller is point-to-point, the interface can run as fast as 133 MHz, and DDR enhancements are due next year. SGRAM always tends to be more expensive than SDRAM at a given density because of lower comparative product volumes, the limited number of vendors, higher performance requirements, and the added silicon cost of the wider interface and graphics-optimized functions. However, many analysts and vendors predict that SGRAM will shortly become the highest volume frame-buffer memory option for PCs, surpassing EDO. Versions of 16-Mbit SGRAM should also appear in the market this year.

Base, Concurrent, and Direct RDRAMs can also act as high-performance data memories, although (like EDO DRAM, SDRAM, and standard MDRAM) they may not contain a full suite of graphics-optimized commands. Graphics vendors have varying opinions on the value of these functions, especially when considering the added cost that the functions incur. (Block write is especially silicon-intensive.) Because data transfers are generally longer than code bursts, the data transfers naturally increase the efficiency of the data bus. Therefore, Direct RDRAM's separate control bus may be unnecessary to achieve reasonable sustained data bandwidth. Microsoft (Redmond, WA) plans to support RDRAM in its upcoming Talisman architecture, and additional graphics-controller support comes from Chromatic Research (Sunnyvale, CA) and Cirrus Logic (Fremont, CA). The Nintendo 64 (Kyoto, Japan) video game uses two RDRAMs as a UMA for all code, graphics, audio, and miscellaneous data functions, and Gateway 2000 (North Sioux City, SD) and Micron use RDRAM in their DVD-equipped PCs.

Mitsubishi offers cache DRAM (CDRAM), which, as the name implies, integrates a full-featured 16-kbit SRAM cache, cache controller, and 4- or 16-Mbit DRAM array on the same die, with separate external address and control buses for each memory and a 128-bit internal datapath between the buses. The external interface is ×16, and the 16-Mbit version supports a burst mode. A future revision of the 4-Mbit version will also add burst mode.

Dual-port video RAM (VRAM) originally serviced the ultrahigh-end graphics market, such as workstations and video arcades. Today's dual-port alternatives include Samsung's window RAM (WRAM) and several other Mitsubishi devices. 3D-RAM, which is the result of Mitsubishi's and Sun Microsystems' (Palo Alto, CA) collaboration and is alternate-sourced by S-MOS Systems (San Jose, CA), combines 10 Mbits of four-bank DRAM, a 2-kbit SRAM cache (with a 256-bit interface between the DRAM and SRAM), a graphics-optimized ALU, and various buffers. On-chip arithmetic functions reduce external read-modify-write traffic to mostly writes, simplify controller design, and boost overall rendering performance. The 3DPro chip set combines 3D-RAM for pixels, CDRAM for texture maps, and a PCI-based graphics controller. The upcoming 16-Mbit, dual-port-graphics RAM (DGRAM), another extrapolation of the CDRAM concept, will provide four DRAM banks, a triple-port SRAM, and 143-MHz performance from each of two 16-bit external buses.

When evaluating memory alternatives for data use, keep in mind the "2-N rule." Most, but not all, multibank DRAMs require you to switch between internal banks for highest performance. For example, for a two-bank DRAM, sequential accesses to addresses 0, 1, 2, and 3 by using internal interleaving and pipelining completes more quickly than a sequence of reads/writes to the same or consecutive even or odd locations. Because linear and interleaved burst sequences toggle the lowest order address or addresses, the 2-N rule is typically not a problem for code fetches. Some DRAMs with a true SRAM cache on board can operate under the "1-N rule" with no performance restrictions, regardless of address sequence.


References

  1. Levy, Markus, "The dynamics of DRAM technology multiply, complicate design options," EDN, Jan 5, 1995, pg 46.
  2. Levy, Markus, "Souped-up memories boost system performance," EDN, Jan 4, 1996, pg 38.
  3. Przybylski, Steven, New DRAM Technologies: A Comprehensive Analysis of the New Architectures, MicroDesign Resources, Sebastapol, CA, 1997.

Acknowledgments

I'd like to recognize the contributions of Terry Lee at Micron Technology, Bob Fusco at Oki Semiconductor, Billy Garrett at Rambus, and Kevin Patrick at Mosaid Technologies. A special thanks also goes to Steven Przybylski of The Verdande Group for writing a box and contributing reference information.


21csglan
  • Increasing system-performance needs outstrip the capabilities of traditional DRAM architectures.

  • DRAM is moving beyond its conservative, predictable, multisourced roots to an era of greater controversy, increased innovation, and more product options.

  • Some advanced DRAMs have evolved from previous-generation roots, whereas others take a more revolutionary approach.

  • Requirements that drive memory selection differ between code and data designs and between PCs and embedded systems.

The fill frequency

Steven Przybylski, The Verdande Group

To fully understand system-performance needs and memory-performance capability, consider both bandwidth and granularity. These two topics are related because both measure memory-subsystem performance. System designs trade off one against the other: You can generate more bandwidth by widening the device bus, but doing so increases granularity.

The ratio of a DRAM's bandwidth to its granularity is analogous to the ratio of the rate of the water flowing into a container to the volume of that container. This ratio, being the inverse of the time it takes to fill or empty the DRAM or container, is the frequency with which other subsystems can access the DRAM to retrieve or store new information. This fill frequency (FF) of a DRAM subsystem is a measure of how much bandwidth is available per bit stored.

Commodity DRAMs have low FFs. For example, a 64-Mbit EDO DRAM with a ×4 bus operating at 33 MHz has a peak FF of 2.06 Hz. In general, the commodity DRAMs used in main memory have FFs of 5 to 50 Hz. Because it is impossible to fill a memory subsystem with information more quickly than you can fill one DRAM in that memory subsystem, the FF of a memory subsystem is necessarily less than or equal to the FFs of the DRAMs used to construct it.

The system market also imposes bandwidth and granularity constraints on the memory-subsystem designer. A system with too low a peak bandwidth or too high an entry-level memory size will be unsuccessful. The ratio of the market-imposed minimum bandwidth constraint to the maximum granularity constraint is also an FF. A memory subsystem with an FF substantially below the market requirement will be unsuccessful, because the bandwidth is too low, the granularity is too high, or both. Because a DRAM's FF is necessarily greater than or equal to the FF of all memory subsystems that contain it, DRAM with an FF below the market requirement is unusable, even in that market's entry-level configurations.

21CSBX3BFigure A shows the FFs for various fast-page-mode (FPM) density and bus-width configurations over the past two decades, as well as the market-required FFs for a number of systems. Until recently, market requirements were roughly constant over time. Granularity requirements have increased at roughly the rate that the price of memory decreases, about 26% annually. Memory-system bandwidth requirements have increased at a slightly lower rate, largely because cache sizes have increased dramatically over the past two decades. Increasing cache sizes result in increasing cache hit rates and corresponding decreases in the percentage of memory references that reach main memory.

In the coming years, however, market-required FFs will increase. Continued cache-size growth will net diminishing returns as compared to the past. Furthermore, many new multimedia applications are stream-oriented and do not cache well. Consequently, increased end-user multimedia-performance expectations will translate directly into increased main-memory bandwidth demands. Also, the rapid shift from 2-D to 3-D graphics means that the demand for frame-buffer bandwidth is growing rapidly. Graphics applications are especially challenging for memory-system designers because the market demands a small frame buffer to generate a lot of bandwidth.

From one DRAM-density generation to another, the FF falls by almost a factor of 4 if the DRAM-bus width remains constant. In the early 1980s, the FFs of the ×1 page-mode DRAMs were substantially greater than system requirements. However, with each successive generation, DRAM FFs declined dramatically. Historically, the simplest and most cost-effective response to unacceptably-low DRAM FFs has been to increase device-bus width, and so since 1985 the width of DRAMs common in entry-level systems has increased from ×1 to ×16. However, at the 64-Mbit level, even a ×16 FPM device would be inadequate. New approaches to delivering necessary DRAM FFs, discussed in the main article, reflect the industry's attempts to meet its customers' performance needs as cost-effectively as possible, and the new approaches involve both technical and business trade-offs.

In future 1- and 4-Gbit devices, the rest of the system will have to extract at least 4 and 16 Gbps, respectively, from one device to maintain FFs at current levels. Even more bandwidth will be necessary if FF requirements grow. Peak per-chip bandwidths of 4 Gbps will be possible by increasing both pin frequency and device-bus width beyond current levels. However, the power needed to develop this bandwidth will be high. Beyond 5 to 10 Gbps, point-to-point connections between DRAMs and controllers will likely be necessary, and any movement away from bus structures will require new DRAM and system architectures. In addition, by the time vendors will be able to cost-effectively build 4-Gbit DRAMs, large classes of embedded and general-purpose computing systems will require substantially less density than one DRAM can provide. A subsequent migration to embedded DRAMs will make merged-DRAM logic devices increasingly important and common.

Steven Przybylski is a principal consultant with The Verdande Group (San Jose, CA, www.verdande.com).

Narrow-bus, high-bandwidth memories

Julie Cates, Rambus Inc

21CSBX2AMemory design is at a crossroads. As memory systems move beyond conventional synchronous DRAMs to narrow-bus, high-bandwidth devices, many aspects of memory-system design must change (Figure A). Space, power, and cost continue to hold center stage, but now an added challenge exists. Vendors must deliver on the promise of narrow-bus interfaces and higher performance at low cost without blocking end-user access to affordable, industry-standard memory modules that are supported by standard motherboard form factors.

Although the benefits of a narrow-bus memory-system interface are clear, the switch to narrow-bus memory-system design requires a new approach to both component and system design. You measure memory systems by their performance, cost, and manufacturability. Seemingly subtle differences in DRAM-device characteristics and interconnection can greatly impact overall system speed and cost. Key to high-speed operation is the quality of the signals, the relationship between the clock and data, and the topology of the data and control buses. As a result, memory-system design must change in several areas.

Overall system design and operation must drive component specifications. You must reconcile component-timing and clock-control mechanisms with the controller-interface, board-design, and DRAM-interface specs.

Clocking must be more precise to minimize skew between the clock and data. DRAM vendors typically accomplish this precision through PLL technology that synchronizes the movement of the clock and data signals. An advantage of the narrow bus is that as memory expands, the timing relationship among signal pins does not change. If the pins connect to a common bus, the loading on each pin can be the same. This commonality allows a narrow-bus design to operate at five to 10 times the frequency of current memory systems.

You must more carefully control ac-timing specs. As DRAM-bus cycles decline from 30 nsec to less than 10 nsec, flight and setup-and-hold times are more critical. Because of the need for DRAMs to work with a particular controller interface and within standard pc-board environments (with their associated capacitance and impedance levels), you must tune DRAM ac-timing specs for full-speed system operation.

You must match DRAM functionality to µP- and controller-access patterns. Memory systems are a shared resource that must manage and queue multiple requests from the CPU, graphics subsystem, and system I/O bus. DRAM components must support multiple transactions involving simultaneous row and column operations. The DRAM's internal core organization should match the processor's cache-fill requirements and realistic workloads. Because individual DRAM banks do not support multiple accesses, the memory system must support many smaller banks, rather than a few large banks, as well as concurrent transactions among DRAMs. A narrow bus that is one DRAM wide can present many banks for the system to access, allowing maximum bus use. An eight-DRAM memory system with eight-bank DRAMs, for example, presents 64 open pages to the memory controller.

The premise of the narrow bus is faster cycle times. As dynamic power increases with load capacitance, frequency, and the square of voltage, the obvious way to manage power is to reduce voltage and capacitance as the frequency increases. Narrow-bus signals, therefore, should support voltage swings of typically 1V or less, as well as capacitance of a few picofarads per pin. Further, system-power management should take advantage of the narrow bus: Only one memory device needs to be active at a time.

To make DRAMs interchangeable across vendors, the ac specs and test measurements must be precise and consistent across suppliers. DRAMs of any generation are fabricated on processes of varying geometries and parameters. The test-program coverage must ensure functional and proper ac operation.

Although a narrow-bus topology has the advantages of small board area and single-DRAM upgrades, high-end desktop systems and servers already support more than 100 Mbytes of memory and will need to support more than 1 Gbyte in the next two years. Using 256-Mbit technology, 32 DRAM components will provide more than 1 Gbyte of storage. When multigigabyte configurations become necessary, a memory controller that can drive multiple buses will become critical.

Narrow-bus memory systems are also attractive for mobile systems with space constraints, such as portable computers. But high-speed data transfer can reduce power consumption only if the DRAMs can idle at a low-power state and quickly move into active states in less than 1 msec. A high-bandwidth DRAM should be able to spend a greater percentage of time in low-power states, thereby reducing overall power consumption.

Julie Cates is the product-marketing manager at Rambus Inc (Mountain View, CA).

Table 1--DRAM and L2-cache clock profiles

Memory

No. of clocks
(66 MHz)
Clock profile
(66 MHz)
No. of clocks
(100 MHz)
Clock profile
(100 MHz)
60-nsec fast-page-mode DRAM 14 5-3-3-3 22 7-5-5-5
60-nsec EDO DRAM 11 5-2-2-2 19 7-4-4-4
66-MHz synchronous DRAM Nine 6-1-1-1 NA NA
100-MHz synchronous DRAM NA NA Nine 6-1-1-1
L2 cache Five 2-1-1-1 Five 2-1-1-1
Table 2--Key Intel PC-66 and PC-100 specifications
SDRAM characteristic PC-66 specification PC-100 specification
Row-address-to-column-address delay Two (preferred) or three cycles Two (preferred) or three cycles
CAS# latency Two (preferred) or three cycles Two (preferred) or three cycles
Precharge Two (preferred) or three cycles Two (preferred) or three cycles
Signal-setup time before CLK 3 nsec 2 nsec
Signal-hold time after CLK 1.5 nsec 1 nsec
Clock-to-data delay 9 nsec 6 nsec
Data hold after CLK 3 nsec 3 nsec
Table 3--System-memory granularity
Memory-
bus width
Memory density (per chip) System-bus
wi
dth
(one bank)
4 Mbits 16 Mbits 64 Mbits 128 Mbits 256 Mbits
×1 16 Mbytes 64 Mbyte 256 Mbytes 512 Mbytes 1 Gbytes ×32
32 Mbytes 128 Mbytes 512 Mbytes 1 Gbytes 2 Gbytes ×64
64 Mbytes 256 Mbytes 1 Gbyte 2 Gbytes 4 Gbytes ×128
×4 4 Mbytes 16 Mbytes 64 Mbytes 128 Mbytes 256 Mbytes ×32
8 Mbytes 32 Mbytes 128 Mbytes 256 Mbytes 512 Mbytes ×64
16 Mbytes 64 Mbytes 256 Mbytes 512 Mbytes 1 Gbyte ×128
×8 2 Mbytes 8 Mbytes 32 Mbytes 64 Mbytes 128 Mbytes ×32
4 Mbytes 16 Mbytes 64 Mbytes 128 Mbytes 256 Mbytes ×64
8 Mbytes 32 Mbytes 128 Mbytes 256 Mbytes 512 Mbytes ×128
×16 1 Mbyte 4 Mbytes 16 Mbytes 32 Mbytes 64 Mbytes ×32
2 Mbytes 8 Mbytes 32 Mbytes 64 Mbytes 128 Mbytes ×64
4 Mbytes 16 Mbytes 64 Mbytes 128 Mbytes 256 Mbytes ×128
×32 0.5 Mbyte 2 Mbytes 8 Mbytes 16 Mbytes 32 Mbytes ×32
1 Mbyte 4 Mbytes 16 Mbytes 32 Mbytes 64 Mbytes ×64
2 Mbytes 8 Mbytes 32 Mbytes 64 Mbytes 128 Mbytes ×128
Memory-
bus width
Memory density (per chip) System-bus
width

(two banks)
4 Mbits 16 Mbits 64 Mbits 128 Mbits 256 Mbits
×1 32 Mbytes 128 Mbytes 512 Mbytes 1 Gbyte 2 Gbytes ×32
64 Mbytes 256 Mbytes 1 Gbyte 2 Gbytes 4 Gbytes ×64
128 Mbytes 512 Mbytes 2 Gbytes 4 Gbytes 8 Gbytes ×128
×4 8 Mbytes 32 Mbytes 128 Mbytes 256 Mbytes 512 Mbytes ×32
16 Mbytes 64 Mbytes 256 Mbytes 512 Mbytes 1 Gbyte ×64
32 Mbytes 128 Mbytes 512 Mbytes 1 Gbyte 2 Gbytes ×128
×8 4 Mbytes 16 Mbytes 64 Mbytes 128 Mbytes 256 Mbytes ×32
8 Mbytes 32 Mbytes 128 Mbytes 256 Mbytes 512 Mbytes ×64
16 Mbytes 64 Mbytes 256 Mbytes 512 Mbytes 1 Gbyte ×128
×16 2 Mbytes 8 Mbytes 32 Mbytes 64 Mbytes 128 Mbytes ×32
4 Mbytes 16 Mbytes 64 Mbytes 128 Mbytes 256 Mbytes ×64
8 Mbytes 32 Mbytes 128 Mbytes 256 Mbytes 512 Mbytes ×128
×32 1 Mbyte 4 Mbytes 16 Mbytes 32 Mbytes 64 Mbytes ×32
2 Mbytes 8 Mbytes 32 Mbytes 64 Mbytes 128 Mbytes ×64
4 Mbytes 16 Mbytes 64 Mbytes 128 Mbytes 256 Mbytes ×128
Table 4--Double-data-rate bandwidth
System-bus width Memory-clock frequency (address and control/data)
66/133 MHz 100/200 MHz 133/266 MHz 150/300 MHz
×32 533 Mbytes/sec 800 Mbytes/sec 1.1 Gbytes/sec 1.2 Gbytes/sec
×64 1.1 Gbytes/sec 1.6 Gbytes/sec 2.1 Gbytes/sec 2.4 Gbytes/sec
×128 2.1 Gbytes/sec 3.2 Gbytes/sec 4.3 Gbytes/sec 4.8 Gbytes/sec
Table 5--Frame-buffer sizes
Graphics mode Resolution Color
depth

(bits)
640×480
pixels
800×600
pixels
1024×768
pixels
1280×1024
pixels
1600×1200
pixels
2-D 307.2 kbytes 480 kbytes 786.5 kbytes 1.4 Mbytes 2 Mbytes 8
614.4 kbytes 960 kbytes 1.6 Mbytes 2.7 Mbytes 3.9 Mbytes 16
921.6 kbytes 1.5 Mbytes 2.4 Mbytes 4 Mbytes 5.8 Mbytes 24
3-D
(8-bit Z, 0-bit alpha)
614.4 kbytes 960 kbytes 1.6 Mbytes 2.7 Mbytes 3.9 Mbytes 8
1.3 Mbytes 2 Mbytes 3.2 Mbytes 5.3 Mbytes 7.9 Mbytes 16
1.9 Mbytes 2.9 Mbytes 4.8 Mbytes 7.9 Mbytes 11.6 Mbytes 24
3-D
(8-bit Z, 8-bit alpha)
921.6 kbytes 1.5 Mbytes 2.4 Mbytes 4 Mbytes 5.8 Mbytes 8
1.9 Mbytes 2.9 Mbytes 4.8 Mbytes 7.9 Mbytes 11.6 Mbytes 16
2.8 Mbytes 4.4 Mbytes 7.1 Mbytes 11.8 Mbytes 17.3 Mbytes 24
3-D
(16-bit Z, 8-bit alpha)
1.3 Mbytes 2 Mbytes 3.2 Mbytes 5.3 Mbytes 7.9 Mbytes 8
2.5 Mbytes 3.9 Mbytes 6.3 Mbytes 10.5 Mbytes 15.4 Mbytes 16
3.7 Mbytes 5.8 Mbytes 9.5 Mbytes 15.8 Mbytes 23.1 Mbytes 24
3-D
(24-bit Z, 8-bit alpha)
1.6 Mbytes 2.4 Mbytes 4 Mbytes 6.6 Mbytes 9.6 Mbytes 8
3.1 Mbytes 4.8 Mbytes 7.9 Mbytes 13.2 Mbytes 19.2 Mbytes 16
4.7 Mbytes 7.2 Mbytes 11.8 Mbytes 19.7 Mbytes 28.8 Mbytes 24
3-D
(32-bit Z, 8-bit alpha)
1.9 Mbytes 2.9 Mbytes 4.8 Mbytes 7.9 Mbytes 11.6 Mbytes 8
3.7 Mbytes 5.8 Mbytes 9.5 Mbytes 15.8 Mbytes 23.1 Mbytes 16
5.6 Mbytes 8.9 Mbytes 14.2 Mbytes 23.6 Mbytes 34.6 Mbytes 24
For more information...
When you contact any of the following manufacturers directly, please let them know you read about their products on EDN's website.
Alliance Semiconductor Corp
San Jose, CA
1-408-383-4900
fax 1-408-383-4999
www.alsc.com
Enhanced Memory Systems Inc
Colorado Springs, CO
1-719-481-7000
fax 1-719-488-9095
www.csn.net/ramtron/enhanced
Fujitsu Microelectronics Inc
San Jose, CA
1-408-922-9000
fax 1-408-432-9044
www.fujitsumicro.com
Hitachi America Ltd
Brisbane, CA
1-415-244-7848
fax 1-415-583-4207
www.hitachi.com
America
San Jose, CA
1-408-232-8000
fax 1-408-232-8125
www.hea.com
IBM Microelectronics Corp
Armonk, NY
fax 1-415-855-4121
www.chips.ibm.com
Integrated Device Technology Inc
Santa Clara, CA
1-408-727-6116
fax 1-408-492-8674
www.idt.com
Integrated Silicon Solutions Inc
Santa Clara, CA
1-800-379-4774
fax 1-408-588-0806
www.issiusa.com
LG Semicon
San Jose, CA
1-408-432-5000
fax 1-408-432-6067
www.lg.co.kr
Matsushita Electric Corp Of America
Secaucus, NJ
1-201-348-7000
fax 1-201-392-4792
www.panasonic.com
Micron Quantum Devices Inc
Boise, ID
1-208-368-3900
fax 1-208-368-4431
www.micron.com
Mitsubishi Electronics America Inc
Sunnyvale, CA
1-408-730-5900
fax 1-408-732-9382
www.mitsubishichips.com
Mosaid Technologies Inc
Kanata, ON, Canada
1-613-836-3134
fax 1-613-831-0796
www.mosaid.com
Mosel Vitelic Inc
San Jose, CA
1-408-433-6000
fax 1-408-433-0952
www.moselvitelic.com
MoSys Inc
Sunnyvale, CA
1-408-731-1800
fax 1-408-731-1893
www.mosys.com
NEC Electronics Inc
Santa Clara, CA
1-408-986-1020
fax 1-408-588-6374
www.nec.com
Nippon Steel Semiconductor Corp
Santa Clara, CA
1-408-524-8000
fax 1-408-524-8040
Oki Semiconductor
Sunnyvale, CA
1-408-720-1900
fax 1-408-720-1918
www.okisemi.com
Rambus Inc
Mountain View, CA
1-415-903-3800
fax 1-415-965-1528
www.rambus.com
Samsung Semiconductor
San Jose, CA
1-408-954-7000
fax 1-408-954-7870
www.samsung.com
Sharp Electronics Corp
Camas, WA
1-360-834-2500
fax 1-360-834-8903
www.sharpmeg.com
Siemens Components Inc
Cupertino, CA
1-408-777-4500
fax 1-408-777-4988
www.siemens.com
Silicon Magic
Santa Clara, CA
1-408-969-3000
fax 1-408-588-2080
www.simagic.com
Smart Modular Technologies
Fremont, CA
1-510-623-1231
fax 1-510-623-1434
www.smartm.com
Texas Instruments Inc
Austin, TX
1-800-477-8924 x4500
www.ti.com
Toshiba America Electronics Components Inc
Irvine, CA
1-714-455-2000
fax 1-714-859-3963
www.toshiba.com
 

Brian Dipert, Technical Editor

You can reach Brian Dipert at 1-916-454-5242, fax 1-916-454-5101, edndipert@worldnet.att.net, http://members.aol.com/bdipert.


| EDN Access | Feedback | Table of Contents |


Copyright © 1997 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc.