Memory Hierarchy Design - Part 5. Crosscutting issues and the memory design of the ARM Cortex-A8
John L. Hennessy, Stanford University, and David A. Patterson, University of California, Berkeley - October 23, 2012
This excerpt comprises:
- Part 1, Basics of Memory Hierarchies, which looked at the key issues surrounding memory hierarchies and set the stage for subsequent installments addressing cache design, memory optimization, and design approaches.
- Memory Hierarchy Design - Part 2. Ten advanced optimizations of cache performance, which reviewed ten advanced optimizations of cache performance
- Memory Hierarchy Design - Part 3. Memory technology and optimizations, which examined innovations in main memory that offer improved system performance
- Memory Hierarchy Design - Part 4. Virtual memory and virtual machines, which examined architecture support for protecting processes from each other via virtual memory and the role of virtual machines
- This installment, which puts it all together with a look at crosscutting issues for memory hierarchy design and reviews the memory design of the ARM Cortex-A8.
2.5 Crosscutting Issues: The Design of Memory Hierarchies
This section describes three topics discussed in other chapters that are fundamental to memory hierarchies.
Protection and Instruction Set Architecture
Protection is a joint effort of architecture and operating systems, but architects had to modify some awkward details of existing instruction set architectures when virtual memory became popular. For example, to support virtual memory in the IBM 370, architects had to change the successful IBM 360 instruction set architecture that had been announced just 6 years before. Similar adjustments are being made today to accommodate virtual machines.
For example, the 80x86 instruction POPF loads the flag registers from the top of the stack in memory. One of the flags is the Interrupt Enable (IE) flag. Until recent changes to support virtualization, running the POPF instruction in user mode, rather than trapping it, simply changed all the flags except IE. In system mode, it does change the IE flag. Since a guest OS runs in user mode inside a VM, this was a problem, as it would expect to see a changed IE. Extensions of the 80x86 architecture to support virtualization eliminated this problem.
Historically, IBM mainframe hardware and VMM took three steps to improve performance of virtual machines:
- Reduce the cost of processor virtualization.
- Reduce interrupt overhead cost due to the virtualization.
- Reduce interrupt cost by steering interrupts to the proper VM without invoking VMM.
IBM is still the gold standard of virtual machine technology. For example, an IBM mainframe ran thousands of Linux VMs in 2000, while Xen ran 25 VMs in 2004 [Clark et al. 2004]. Recent versions of Intel and AMD chipsets have added special instructions to support devices in a VM, to mask interrupts at lower levels from each VM, and to steer interrupts to the appropriate VM.
Coherency of Cached Data
Data can be found in memory and in the cache. As long as the processor is the sole component changing or reading the data and the cache stands between the processor and memory, there is little danger in the processor seeing the old or stale copy. As we will see, multiple processors and I/O devices raise the opportunity for copies to be inconsistent and to read the wrong copy.
The frequency of the cache coherency problem is different for multiprocessors than I/O. Multiple data copies are a rare event for I/O—one to be avoided whenever possible - but a program running on multiple processors will want to have copies of the same data in several caches. Performance of a multiprocessor program depends on the performance of the system when sharing data.