Feature
Which way? DRAM buyers and sellers at the crossroads
DRAM suppliers might desire a one-size-fits-all future, but divergent application needs don't comply. DRAM users might prefer a custom chip, but its cost would be excessive. Can each side secure sufficient concessions to seal the deal without selling its soul in the process?
By Brian Dipert, Technical Editor -- EDN, 4/24/2003
|

In the more than three years since EDN last devoted a cover story to DRAM, some aspects of this memory technology have changed dramatically, whereas others have changed little (Reference 1). The industry has largely decided the Direct RDRAM-versus-DDR SDRAM tussle for the future of PC main memory in favor of the JEDEC-sanctioned competitor: DDR SDRAM. Intel has thrown in the towel on Rambus—or at least that's what Intel's now saying—with nothing but DDR SDRAM on its chip-set roadmap. Long-touted DR-II technology is very late to the party, however, forcing the PC industry to a 400-Mbps/pin DDR-I speed bin that it initially viewed as nothing more than a low-volume, short-lived distraction. Ironically, the slothlike pace of industry-standards bodies is a key factor that motivated Intel to partner with Rambus in the first place. Meanwhile, Rambus gamely fights on, along with its latest core-logic partner, SiS, espousing claims of high-end PC success yet to come.
The memory-density granularity issue that has perpetually plagued non-PC main-memory applications has now spread to the PC. Recent market transitions from DOS- to Windows NT-based operating systems—that is, from Windows 98 and ME to Windows 2000 and XP—have translated to a moderate increase in base-system memory densities, but a dearth of memory-gobbling applications hasn't extended the trend (Figure 1). Ever-increasing system-performance needs and the ever-wider system buses crafted to address those needs have only accentuated the granularity problem and are forcing the DRAM suppliers to develop the 32-bit interface memories they long resisted (see sidebar "Wide not?").
Application-defined density growth caps, perpetually dismal financial fortunes, and a consequent paucity of R&D and infrastructure investments are causing the DRAM manufacturers to fall behind the rest of the industry's Moore's Law-driven transistor-integration trends. (Where is that long-promised 1-Gbit DRAM?) Low prices are good news if cost is your only chip criterion. But a single DRAM "shoe" no longer fits on every application "foot." If a DRAM manufacturer is developing a device that is optimized for cost, the vendor will choose small, slow array transistors and the fewest possible array banks and will specify conservative figures for the chip's voltage, current, ac timings, and other specifications to ensure maximum manufacturing yield.
Do you want a chip with fast burst-transfer performance, fast sustained-transfer performance, or both? Then, the vendor needs to employ larger, more power-hungry transistors, sense amplifiers, and other circuits; subdivide the array into multiple banks to enable coincident operations; take advantage of wide internal buses; and maybe even include an embedded-SRAM cache. And what if low power consumption is your primary focus? The vendor might again choose to subdivide the array into numerous banks—in this case, for fine-grained control of storage transistor, sense amplifier and other circuits' active-versus-standby modes. In most other respects, however, the part would differ greatly from its performance-optimized counterpart, with low switching and leakage current, rather than high speed, the primary focus.
What if your application needs a special feature, such as the write-per-bit and block-write modes still found in a few graphics-tailored memories? John Montrym, Nvidia's chief architect, comments, "I've been in graphics-controller design long enough to recall when some of these features made sense. Today, however, our controllers must operate with as wide a range of memories as possible, including the non-bleeding-edge variety. We hesitate to create logic that takes advantage of exotic features. The payoff for such logic is relatively low. Furthermore, when graphics requirements change, these memory features are often no longer effective. We've found that it's best to focus on data-transfer efficiency. Let the memory store bits cheaply, the controller access those bits rapidly, and the graphics chip perform the graphics manipulation." (See Reference 2.)
Montrym's perspective is common but by no means universal. When, for example, you attempt to quickly migrate a historically SRAM-based design to DRAM, that DRAM must provide an SRAM-like interface, contain self-refresh circuitry, and, more generally, act in all possible respects like the memory it's replacing. Sometimes, too, minor plastic surgery won't cure the DRAM patient of its slow random-access disfigurement; it may require more extensive alterations.
Double troubleWhen Intel's CPUs moved from the Pentium III to the Pentium 4 generation, and the devices' front-side buses' data rates consequently accelerated from 133 to 400 Mbps per pin—a quadruple-data-rate, 100-MHz bus—the company incorporated the dual-16-bit-Rambus-channel architecture, which it previously employed in the Pentium III's i840 workstation core logic, into the Pentium 4's mainstream i850 and high-end i860 chip sets. This doubling of the DRAM subsystem peak bandwidth to 3.2 Gbytes/sec exactly matched the Pentium 4 front-side-bus peak transfer rate. Subsequently announced i845 chip sets supported a single DDR-I SDRAM channel operating as fast as 133 MHz, translating to 266 Mbps/pin peak transfer rates or, for a 64-bit channel, 2.13-Gbyte/sec peak speeds. Although the mismatch between system and DDR-I SDRAM buses looked ugly on paper and although memory-intensive benchmarks also consistently gave the performance nod to DRDRAM-based systems, most users noticed little to no difference in system speeds. DDR-I SDRAM's lower cost per bit secured its success.
Beginning at 2 GHz, Intel's Pentium 4 CPUs migrated to a 533-Mbps/pin bus running at 133 MHz, further accentuating the performance gap between 266-Mbps/pin DDR-I SDRAM and CPU front-side bus speeds. DDR333-I SDRAM—with 2.66-Gbyte/sec peak transfer rates for a 64-bit bus—narrowed but didn't close the gap. Dual PC-1066 DRDRAM channels, delivering 4.26 Gbytes/sec peak speeds, did however keep pace with the CPU, persuading Intel to roll out its i850E chip set. Now, however, Intel's latest Pentium 4 and Xeon CPUs, based on 0.09-micron-process lithographies, are running at 800-Mbps/pin front-side-bus transfer rates. What will Intel do, with Direct RDRAM no longer its PC poster child but DDR-II technology still not in high-volume production, to ensure that the DRAM subsystem doesn't drag down overall system performance?
Intel's next-generation Canterwood and Springdale core-logic chip sets, due out this summer, adopt the dual-channel DDR-SDRAM approach now in the company's E7205 "Granite Bay" workstation chip set. Intel's stance on the 400-Mbps/pin DDR-I SDRAM conspicuously evolved from a lukewarm "evaluating-it" stance to a definitive "supporting-it" declaration at this spring's Developer Forum. Two 64-bit channels of DDR400-I SDRAM bloat the core logic's north-bridge pin count, but they also deliver 6.4-Gbyte/sec peak data-transfer rates. To circumvent the performance gap between memory and front-side bus speeds and the more significant gap between memory and internal clock rates, Intel is also further increasing the amount of on-chip cache memory (Figure 2). The upcoming Nocoma version of Xeon and the next "Prescott" iteration of the Pentium 4 will boost the L2 cache size from 512 kbytes to 1 Mbyte. Next year's Gallatin variant of Xeon will add a 4-Mbyte L3 cache.
Industry analysts have differing opinions on the success of DDR400-I and its consequent impact on the DDR-II rollout. The word "peak" is key when describing memory-bus speeds. DDR SDRAMs and Direct RDRAMs do indeed deliver impressive peak transfer rates when you're streaming sequential data from their small sense-amp arrays. Behind those pseudocaches, though, lie conventional low-cost and, therefore, slow DRAM arrays with approximately 60-nsec random-access times. Consider, too, the performance-sapping multiclock overhead of the multiplexed address bus, which you encounter each time you access an inactive DRAM page.
The DDR400-I SDRAMs that Intel is initially specifying for Canterwood and Springdale are of the so-called 3-4-4 variety. This nomenclature refers to the three-clock CAS (column-address-strobe)-to-data, four-clock RAS (row-address-strobe)-to-CAS and four-clock RAS precharge latencies. To squeeze this speed from the DDR-I SDRAM architecture, Intel raised the parts' core and I/O voltages to 2.6V and tightened their tolerances to ±0.1V; slower DDR-1 parts run at 2.5 ±0.2V. Mainstream 2.5-3-3 or faster DDR333-I parts, whose higher yields translate to lower cost, may deliver equivalent or better performance than their faster clocking but longer latency DDR400-I counterparts in some applications!
As the desktop-PC market matures and its growth slows, companies searching for high-profit-margin niches are increasingly targeting "overclockers," computer users who push systems beyond their rated voltages and clock speeds in search of higher graphics frame rates and other improved performance metrics. Corsair's XMS, Kingston Technology's HyperX, and Mushkin's High Performance Black modules have more aggressive timing specifications than those of commodity devices and demonstrate the perceived importance of low memory latency in defining overall system performance. Kingston, for example, sells the 333-Mbps/pin PC2700 2-2-2; 370-Mbps/pin PC3000 2-2-2; 400-Mbps/pin PC3200 2-2-2; and 434-Mbps/pin PC3500 2-3-3 DIMMs, which contain aluminum heat spreaders for optimal heat diffusion.
As for AMD's Athlon CPUs, although the company's original 200-Mbps/pin front-side-bus-transfer rate exceeded the capability of Intel's Pentium III front-side bus, Intel stole the speed crown with its Pentium 4 and never looked back. Latest generation Athlon XP chips employ a 333-Mbps/pin transfer bus and recent chip-set-partnership announcements suggest that AMD will soon migrate to a 400-Mbps/pin bus. AMD's upcoming Athlon64 CPUs, like Transmeta's Crusoe for SDR and DDR SDRAM, Hewlett-Packard's EV7 Alpha for Direct RDRAM, and Intel's never-released Timna for Direct RDRAM, embed the DRAM controller within the microprocessor. In doing so, they eliminate data-transfer delays through external core logic and thus minimize latency effects between the DRAM and the microprocessor.
The theory of evolutionHowever, the memory supplier and the user community eventually reach a point at which they cannot cost-effectively squeeze more speed from an architecture not originally intended to run at that speed. As Nvidia's Montrym states, "Ultimately, there is an intersection in the two curves of clock speed and design experience where the desired clock rate overtakes design experience. At that inflection point, we need to migrate to a next-generation memory." How does the upcoming DDR-II architecture squeeze more performance—or, in other words, better yields at the same performance—at equivalent or lower power consumption from a cost-focused DRAM array (Figure 3 and Table 1)?
One key improvement in DDR-II over DDR-I is the migration to lower 1.8V core and SSTL I/O voltages. Also, to reduce impedance and trace length, DDR-II parts will move from TSOP to BGA packaging. (Some DDR-I chips already employ BGA packaging for the same reasons.) DDR-II memories will simultaneously fetch four sequential bits of information for each data bit, employing the so-called 4n prefetch mode versus 2n in DDR-I and 1n in SDR SDRAM. This feature trades off slightly larger periphery circuitry for the ability to retain the slow, area-optimized DRAM array. Beginning at the 1-Gbit generation, DDR-II parts will also employ an eight-bank array, versus four banks on lower-density devices and on DDR-I chips.
For ease of system integration, especially in heavily loaded high-end workstation and server designs, DDR-II parts will support optional differential data strobes, along with read-plus-one write-cycle latency. (DDR-I uses a fixed one-cycle write latency.) DDR-II parts will also feature posted CAS, the ability to issue read and write commands to the memory before latching in the column address. Memory controllers can, via issued commands, enable and configure the values of each memory's on-die termination resistors and can calibrate each chip's output-driver strength. Although DDR-II DIMMs have the same dimensions as their DDR-I predecessors, their altered pinouts enable more robust power delivery, and they migrate from 184 to 240 contacts to comprehend almost three times the previous number of grounds along with the addition of other signals. The DIMM contact pitch therefore shrinks, although SODIMM pinouts and pitches for DDR-I and -II are identical. Note too that each motherboard trace can connect to only two unbuffered DIMMs in the DDR-II generation, versus four DIMMs with DDR-I.
Was Intel's turnaround embrace of DDR400-I a result of its faster than expected migration to an 800-Mbps/pin front-side bus, of slower than expected DDR-II production-volume ramps, or some combination of the two factors? No one's saying for sure, and nobody disputes the marketing value of a DRAM interface whose peak transfer rate matches up with the CPU. But the ambiguity surrounding DDR400-I's true performance potential makes it a good bet that DDR400-II with its predicted 3-3-3 and 2.5-3-3 speeds and subsequent 533-MHz and faster variants will be the industry's long-term DDR SDRAM approach.
New packages, new modules, new chip designs, new testers, and new test flows, however, translate to much short-term uncertainty in a market that shipped, according to World Semiconductor Trade Statistics estimates, almost 600 billion bits of memory last year. Kentron hopes to take advantage of that uncertainty with its QBM (quad-band-memory) technology, which the company has been promoting for many years. QBM employs STMicroelectronics' FET switch arrays and IDT's clock circuits to rapidly switch between the outputs of multiple SDR or DDR-I SDRAMs, delivering peak performance comparable with that of DDR-II at a small incremental module cost over DDR-I (Figure 4). Via Technologies plans to optionally support dual-channel QBM modules with its Apollo PT600 chip set; the company hopes that this experiment is more successful than its past provisions for NEC's now-defunct Virtual Channel DRAMs.
Rambus, too, aspires to benefit from the turbulence of the DDR-I to DDR-II transition. Partner SiS's R658 chip set, like Intel's i850, employs a dual-channel RDRAM interface, but, unlike the i850, it comprehends PC1200 memories in addition to PC800 and PC1066 variants. The upcoming SiSR659, with anticipated third-quarter availability, will include four Rambus channels, delivering an aggregate 9.4-Gbyte/sec peak bandwidth when you couple it with PC1200 devices and requiring a much lower total interface pin count than the equivalent-bandwidth multichannel DDR SDRAM alternative.
Giga-graphicsThose of you who closely follow the graphics business may find it somewhat strange to be reading about an industry struggle toward high-volume implementation of DDR SDRAMs attached to 64-bit memory channels and running on 200-MHz clocks. After all, ATI Technologies' latest Radeon 9800 Pro graphics boards employ 340-MHz DDR SDRAMs attached to 256-bit buses, and Nvidia's GeForce FX 5800 Ultra harnesses 500-MHz DDR SDRAMs on 128-bit buses. What's so different about PC main memory and PC graphics, leading to the performance disparity?
Graphics chips connect to their frame buffers over 2.5-in. or shorter, point-to-point, single-load buses, versus the much longer, multidrop buses that main memory uses. Graphics chips also don't have to contend with module-connector impedance. And perhaps most important, the standardization process is much simpler with graphics chips. Most graphics-card vendors, who specialize in distribution and marketing, not engineering, directly take chip vendor's reference designs to production without alterations. With each product generation, the graphics vendor secures a supply deal with one or a few memory vendors for a custom-tested product variant that might have altered impedances, operating temperatures, and voltages, along with narrower tolerances for these parameters than its PC-main-memory counterpart.
Sometimes, graphics-and-memory-vendor partnerships result in custom-tailored DRAMs. Look, for example, at the low-power GDDR2-M SDRAMs that Elpida developed and ATI Technologies uses in its Mobility Radeon 9600 (Reference 3). Compared with DDR-II SDRAMs, GDDR2-M chips' on-die termination pulls the data bus to ground via an NMOS transistor to eliminate dc current draw instead of to the midpoint of the data-bus high and low levels with conventional memories. GDDR2-M chips comprehend only length-of-four bursts, a sequential-burst sequence, a fixed-burst start address and length-of-one write latency. They also support data-inversion capability to reduce cycle-to-cycle external bus transitions. During reads, one data-mask signal pairs with each 8 data bits, and, during writes, each 32 data bits couple with a data-inversion-mask signal (Figure 5).
As the graphics business becomes increasingly competitive, however, cost pressures escalate and multivendor and multicustomer standardized product variants become more appealing. Cooperation among memory suppliers and graphics vendors has resulted in the GDDR3 specifications, which target memories running at clock rates as high as 800 MHz. Instead of DDR SDRAMs' current-based SSTL buses, GDDR3 employs voltage-based pseudo-open-drain outputs. The idle state of the unidirectional, single-ended strobe signals is VDDQ to simplify their distribution. Like DDR-II SDRAMs, though, GDDR3 chips center data within the clock window for writes and align data on clock edges for reads.
With its Yellowstone technology, Rambus redoubles its efforts at capturing graphics and other consumer applications that, with the Nintendo 64 and Playstation 2, were the company's first success stories (Figure 6). Yosemite comprehends low-voltage and -power differential signaling; the ability to transfer 8 bits of data on each clock edge; and FlexPhase precision-data transfer to an accuracy of 2.5 psec, which accounts for pc-board routing, packaging, and on-chip clock skew and thereby enables low-cost system manufacturing.
Initial variants of Yosemite will run at 3.2-Gbps/pin data rates, and the company plans rates as high as 6.4 Gbps/pin. Elpida, Sony, and Toshiba have all licensed Yellowstone. Sony and Toshiba have also licensed the high-speed, parallel Redwood logic interface, and it's a safe bet that the technology will appear on a future Playstation game console. Compared with DRAM chips, RDRAM chips' excessive random-access latencies have historically limited the chips' success in non-PC main-memory applications, and the currently available Yellowstone documentation reveals no information about whether Rambus has addressed this issue in the Yosemite generation. The company demonstrated its first logic-process-based Yellowstone-interface test chips last summer.
The road less traveled byOnly a small percentage of you design PCs or their graphics subsystems. Why, then, does most of this article discuss DRAMs for PCs and graphics? The reason is that these markets are the two largest consumers of DRAMs, and the chips that sell into these markets, therefore, tend to be the lowest priced options. Price is often the most important factor you consider when selecting a memory. (More accurately, it often represents the top 10 factors you consider!)
Sometimes, though, there's a showstopper-feature omission or discrepancy that precludes your use of a conventional DRAM. Perhaps you need faster random accesses to match up with nonsequential data-read and -write traffic. Alternatively, you may need the DRAM to run at lower average power consumption or to offer a demultiplexed address bus and other conventional control signals. Fear not: Help is on the way. As PC-market growth flattens, DRAM vendors must pay more than lip service to emerging applications to fill their fabs.
Stay tuned this year for an article on both high-speed and low-power SRAM-replacement DRAMs. Until then, if your interest is in high-speed memories, take a look at Enhanced Memory's ESRAMs, Samsung and Toshiba's FCRAMs, and Infineon and Micron's RLDRAMs. If power consumption is your concern, consider PSRAMs, JEDEC-specified, low-power SDRAMs with partial refresh, temperature-compensated refresh, and deep-power-down modes, and Fujitsu's FCRAMs.
| For more information... | ||
| When you contact any of the following manufacturers directly, please let them know you read about their products in EDN. | ||
| AMD (Advanced Micro Devices) www.amd.com | ATI Technologies www.ati.com | Avnet www.avnet.com |
| Cisco Systems www.cisco.com | Corsair www.corsairmemory.com | Denali Software www.denali.com |
| Elpida Memory www.elpida.com | Enhanced Memory Systems www.edram.com | Fujitsu www.fujitsu.com |
| Hewlett Packard www.hp.com | IDT (Integrated Device Technology) www.idt.com | Infineon Technologies www.infineon.com |
| Inquest Market Research www.inqst.com | Intel www.intel.com | Kentron Technologies www.kentrontech.com |
| Kingston Technology www.kingston.com | Micron Technology www.micron.com | Mushkin www.mushkin.com |
| Nvidia www.nvidia.com | Rambus www.rambus.com | Samsung Electronics www.samsungelectronics.com |
| Semico Research www.semico.com | SIS (Silicon integrated Systems) www.sis.com.tw | Sony Corp www.sony.com |
| STMicroelectronics www.st.com | Toshiba www.toshiba.com | Transmeta www.transmeta.com |
| Via Technologies www.viatech.com.tw | Xilinx www.xilinx.com | WSTS (World Semiconductor Trade Statistics) www.wsts.org |
| Author Information |
Technical editor Brian Dipert's PC800 DRDRAM-powered and VC820 motherboard-based desk- top PC died a few months ago after many years of faithful service, when its power supply self-destructed. Now, he's off to build its replacement, based on DDR400-I memory, a HyperThreaded 3.06-GHz Pentium 4 processor, and Intel's D875PPB2 motherboard. You can reach him at 1-916-454-5242, bdipert@edn.com. |
| References |
|
| Acknowledgment | ||
| Denali Software's memory reports, published monthly, were an invaluable resource. Equally helpful were the DRAM presentations from Denali's MemCon conference and Inquest Market Research's Platform conference, along with documens supplied by Micron Technology. | ||
|















Technical editor Brian Dipert's PC800 DRDRAM-powered and VC820 motherboard-based desk- top PC died a few months ago after many years of faithful service, when its power supply self-destructed. Now, he's off to build its replacement, based on DDR400-I memory, a HyperThreaded 3.06-GHz Pentium 4 processor, and Intel's D875PPB2 motherboard. You can reach him at 1-916-454-5242,