Memory Hierarchy Design - Part 3. Memory technology and optimizations
John L. Hennessy, Stanford University, and David A. Patterson, University of California, Berkeley - October 8, 2012
This excerpt comprises:
- Part 1: Basics of Memory Hierarchies, which looked at the key issues surrounding memory hierarchies and set the stage for subsequent installments addressing cache design, memory optimization, and design approaches.
- Part 2: Ten advanced optimizations of cache performance, which reviewed ten advanced optimizations of cache performance
- This installment, which examines innovations in main memory that offer improved system performance.
Be sure to stay tuned for further installments of this valuable reference.
... the one single development that put computers on their feet was the invention of a reliable form of memory, namely, the core memory. … Its cost was reasonable, it was reliable and, because it was reliable, it could in due course be made large. [p. 209]Maurice WilkesMemoirs of a Computer Pioneer (1985)
2.3 Memory Technology and Optimizations
Main memory is the next level down in the hierarchy. Main memory satisfies the demands of caches and serves as the I/O interface, as it is the destination of input as well as the source for output. Performance measures of main memory emphasize both latency and bandwidth. Traditionally, main memory latency (which affects the cache miss penalty) is the primary concern of the cache, while main memory bandwidth is the primary concern of multiprocessors and I/O.
Although caches benefit from low-latency memory, it is generally easier to improve memory bandwidth with new organizations than it is to reduce latency. The popularity of multilevel caches and their larger block sizes make main memory bandwidth important to caches as well. In fact, cache designers increase block size to take advantage of the high memory bandwidth.
The previous sections describe what can be done with cache organization to reduce this processor–DRAM performance gap, but simply making caches larger or adding more levels of caches cannot eliminate the gap. Innovations in main memory are needed as well.
In the past, the innovation was how to organize the many DRAM chips that made up the main memory, such as multiple memory banks. Higher bandwidth is available using memory banks, by making memory and its bus wider, or by doing both. Ironically, as capacity per memory chip increases, there are fewer chips in the same-sized memory system, reducing possibilities for wider memory systems with the same capacity.
To allow memory systems to keep up with the bandwidth demands of modern processors, memory innovations started happening inside the DRAM chips them.selves. This section describes the technology inside the memory chips and those innovative, internal organizations. Before describing the technologies and options, let’s go over the performance metrics.
With the introduction of burst transfer memories, now widely used in both Flash and DRAM, memory latency is quoted using two measures - access time and cycle time. Access time is the time between when a read is requested and when the desired word arrives, and cycle time is the minimum time between unrelated requests to memory.
Virtually all computers since 1975 have used DRAMs for main memory and SRAMs for cache, with one to three levels integrated onto the processor chip with the CPU. In PMDs, the memory technology often balances power and speed, with higher end systems using fast, high-bandwidth memory technology.
The first letter of SRAM stands for static. The dynamic nature of the circuits in DRAM requires data to be written back after being read - hence the difference between the access time and the cycle time as well as the need to refresh. SRAMs don’t need to refresh, so the access time is very close to the cycle time. SRAMs typically use six transistors per bit to prevent the information from being disturbed when read. SRAM needs only minimal power to retain the charge in standby mode.
In earlier times, most desktop and server systems used SRAM chips for their primary, secondary, or tertiary caches; today, all three levels of caches are integrated onto the processor chip. Currently, the largest on-chip, third-level caches are 12 MB, while the memory system for such a processor is likely to have 4 to 16 GB of DRAM. The access times for large, third-level, on-chip caches are typically two to four times that of a second-level cache, which is still three to five times faster than accessing DRAM memory.