Memory Hierarchy Design - Part 5. Crosscutting issues and the memory design of the ARM Cortex-A8

John L. Hennessy, Stanford University, and David A. Patterson, University of California, Berkeley -October 23, 2012

Editor's Note: Demand for increasing functionality and performance in systems designs continues to drive the need for more memory even as hardware engineers balance the dynamics of system capability, power, and cost against the growing performance gap between processor and memory. Architectures based on memory hierarchy address these issues, and what better source for the details of this approach than an excerpt on the subject from the seminal book on Computer Architecture by John Hennessy and David Patterson.
This excerpt comprises:
Adapted from "Computer Architecture, Fifth Edition: A Quantitative Approach" by John Hennessy and David Patterson (Morgan Kaufmann)

2.5 Crosscutting Issues: The Design of Memory Hierarchies
This section describes three topics discussed in other chapters that are fundamental to memory hierarchies.

Protection and Instruction Set Architecture
Protection is a joint effort of architecture and operating systems, but architects had to modify some awkward details of existing instruction set architectures when virtual memory became popular. For example, to support virtual memory in the IBM 370, architects had to change the successful IBM 360 instruction set architecture that had been announced just 6 years before. Similar adjustments are being made today to accommodate virtual machines.

For example, the 80x86 instruction POPF loads the flag registers from the top of the stack in memory. One of the flags is the Interrupt Enable (IE) flag. Until recent changes to support virtualization, running the POPF instruction in user mode, rather than trapping it, simply changed all the flags except IE. In system mode, it does change the IE flag. Since a guest OS runs in user mode inside a VM, this was a problem, as it would expect to see a changed IE. Extensions of the 80x86 architecture to support virtualization eliminated this problem.

Historically, IBM mainframe hardware and VMM took three steps to improve performance of virtual machines:

  1. Reduce the cost of processor virtualization.
  2. Reduce interrupt overhead cost due to the virtualization.
  3. Reduce interrupt cost by steering interrupts to the proper VM without invoking VMM.

IBM is still the gold standard of virtual machine technology. For example, an IBM mainframe ran thousands of Linux VMs in 2000, while Xen ran 25 VMs in 2004 [Clark et al. 2004]. Recent versions of Intel and AMD chipsets have added special instructions to support devices in a VM, to mask interrupts at lower levels from each VM, and to steer interrupts to the appropriate VM.

Coherency of Cached Data
Data can be found in memory and in the cache. As long as the processor is the sole component changing or reading the data and the cache stands between the processor and memory, there is little danger in the processor seeing the old or stale copy. As we will see, multiple processors and I/O devices raise the opportunity for copies to be inconsistent and to read the wrong copy.

The frequency of the cache coherency problem is different for multiprocessors than I/O. Multiple data copies are a rare event for I/O—one to be avoided whenever possible - but a program running on multiple processors will want to have copies of the same data in several caches. Performance of a multiprocessor program depends on the performance of the system when sharing data.

Next: Title-1

Loading comments...

Write a Comment

To comment please Log In