|
||||||
|
||||||
![]() IBM/Motorola PowerPC |
|
View block
|
|
Serving as a base for a family of RISC chips, the PowerPC derives its core architecture from the performance-optimized-with-enhanced-RISC (POWER) architecture. The instruction set and 32 32-bit, general-purpose registers support multiple microarchitecture implementations that include the 32-bit 603e, 604e, 740, 750, and embedded processors (Motorola's MPC 50x, MPC8x0, MPC82x, and IBM's 400 series).
The PowerPC 750 contains seven parallel-operating execution units: two integer units, a branch-processing unit, a load/store unit (LSU), a floating-point unit (FPU), a condition-register unit, and an L2-cache-interface unit. (The 740 is the lower cost version of the 750 and lacks the L2-cache-interface unit.) This CPU can fetch as many as four instructions per cycle. The 750 processes branches as they enter the instruction buffer and can decode and dispatch two nonbranches in one cycle. Completion logic keeps track of the outstanding instructions and retires them in order.
The PowerPC 750 µP uses static or dynamic branch prediction to improve the accuracy of instruction prefetching. For static prediction, the branch-operation codes provide hints to predict whether a branch is taken or not. For dynamic prediction, the CPU uses a 512-entry branch-history table and a 64-entry branch-target instruction. The CPU permits speculative execution down a predicted path beyond one unresolved branch.
The 750 has separate 32-kbyte instruction and data caches. Both eight-way, set-associative, lockable caches provide byte-level parity checking. A locked cache typically supplies data on a hit, but cache lines are not replaced on a miss. The 750 contains an on-chip L2-cache controller and backside L2 bus, which improves system performance by reducing system-bus traffic. The L2-cache controller includes 8196 tag entries, which support 256 kbytes, 512 kbytes, or 1 Mbyte of external, two-way, set-associative, unified L2 cache. The L2 cache uses standard, commodity SRAMs. The nonblocking L2 cache supports hit-under-miss mode and can simultaneously service as many as four requests. The L2-cache bus can operate at various speeds relative to the processor frequency.
The PowerPC 604e contains seven independent execution units: two single-cycle integer units, a multiple-cycle integer unit, a branch-processing unit, an LSU, an FPU, and a condition-register unit. Instructions execute out of order, and execution results can be immediately available to subsequent instructions through the use of rename registers. The completion unit commits, or "retires," results to floating-point or general-purpose registers. The unit retires as many as four instructions per clock cycle in order, ensuring a precise exception model.
The PowerPC 604e µP uses dynamic branch prediction to improve the accuracy of instruction prefetching. This feature and the ability to speculatively execute through two unresolved branches minimize pipeline stalls. The 604e has separate 32-kbyte, four-way, set-associative instruction and data caches, both of which provide byte-level parity checking.
The 604e and 750 have separate memory-management units (MMUs) for instructions and data. The MMUs support as many as 4 petabytes of virtual memory and 4 Gbytes of physical memory. Access privileges and memory protection are controlled on 128-kbyte to 256-Mbyte blocks and 4-kbyte pages. Translation-look-aside buffers (TLBs) with 128 entries efficiently translate addresses by storing the most recently used page translations.
The 604e and 750 support 64-bit data and 32-bit address buses. The interface protocol allows multiple masters to access system resources through a central arbiter. The PowerPC 604e works in multiprocessor systems and snooping tasks and requires no additional bus cycles. The 604e's on-chip snooping logic maintains cache coherency in multiprocessor systems. The 750 supports snooping but is optimized for uniprocessor systems. It supports no data sharing among caches in different processors. The buses on the 604e and 750 are compatible electrically and in the protocol they use. A common chip set supports both processors.
The 603e comprises five parallel execution units: integer execution, floating point, branch, system, and load/store. With a four-stage pipelinefetch, dispatch, execute, and completethe 603 can achieve three instructions per clock cycle. During the fetch stage, the 603 uses a six-instruction prefetch queue to hold pending instructions. Unlike other PowerPC derivatives, the 603 supports only static branch prediction. However, the architecture supports out-of-order execution and in-order retirement, similar to other PowerPC devices.
The embedded PowerPC processors include IBM's 400 series and Motorola's MPC500 and MCP800 families and devices. Compared with other PowerPC devices, these devices have similarbut fewerexecution units. IBM's 403Gx embedded controllers have a five-stage pipeline and can dispatch as many as two instructions per cycle. These devices implement static branch prediction and branch folding and have a four-instruction prefetch queue. Integrated caches of varying sizes are two-way set-associative and are implemented as fetch-through instruction caches and write-back data caches. (The 403GCX data cache does not provide write-through operation.) The 403Gx processors do not provide hardware support for maintaining cache coherency during DMA and external bus-master operations or in a multiprocessor configuration.
The PowerPC 403GC and 403GCX include an MMU featuring a fully associative TLB. Each entry provides translation for a memory page, which can be one of several sizes for efficient system-memory use. Memory components attach directly to the 403 devices with a programmable-memory interface on the processor's bus-interface unit. The DRAM controller includes the address multiplexer, eliminating the need for an external address multiplexer. The DRAM controller supports external bus masters. You can use software programming to tune the timing for the interface control signals.
The PowerPC 401GF implements a three-stage pipeline and supports hardware multiply and divide and unaligned loads and stores. The CPU uses operand forwarding and static branch prediction to increase performance. The 401GF's cache controllers implement critical data forwarding, fill-first handling of cache misses, and nonblocking flush operations.
Motorola's MPC500 and MPC800 families, although targeting different applications, have the same basic CPU architecture. (However, the new MPC8260 PowerQUICC II is an upgrade of the MPC860 and contains a PowerPC EC603e core.) Both families integrate a fixed-point unit (FXU), an LSU, two register files, and a sequencer unit; the MPC500 family also adds an FPU. The FPU includes single- and double-precision multiply-add instructions. The sequencer unit contains a branch processor featuring static branch prediction and branch-folding capability during execution (zero-cycle branch execution time) and runtime reordering of loads and stores.
The MPC500 and MPC800 devices use an InterModule Bus, developed for Motorola's 683xx devices, as a backplane to connect all system modules. Both families include a system-integration unit (SIU) that enables simple integration with external memories, other CPUs, and peripherals. The SIU for the MPC505 and MPC509 differs from the one in the 800 family devices and in the MPC555. The 505 and 509 SIUs have separate data and instruction buses; the 800 and 555 devices combine these buses outside the SIU. The 800 family has both instruction and data caches and an MMU. The caches are two-way set-associative and feature lockability on a line.
Special instructions: Motorola has expanded the PowerPC architecture with its AltiVec technology162 new instructions along with a 128-bit vector-execution unit that performs single-instruction multiple-data operations concurrently with the integer units and FPUs. AltiVec supports 16-way parallelism for 8-bit integers and characters, eight-way parallelism for 16-bit integers, and four-way parallelism for 32-bit integers and IEEE floating-point numbers. AltiVec also includes a separate register file with 32 128-bit-wide registers.
Development tools: The PowerPC families have a large third-party tool-supplier base. IBM also offers development tools for all its PowerPC embedded processors. These tools include a C/C++ compiler; a RISCWatch debugger with in-circuit emulation; a ROM monitor; RTOS-aware debugging; and real-time, noninvasive trace capability.
Second sources: Mitsubishi is a second source for IBM's embedded PowerPC µPs.
| EDN Access | Feedback | Table of Contents |
Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc.