EDN Access

 

September 25, 1997


64-BIT


Digital Semiconductor Alpha

20ALPHADigital's Alpha, the fastest µP available, has a 64-bit load/store architecture with 32 integer and 32 floating-point registers. The Alpha chips include the second-generation, performance-focused 21164 and the cost-reduced 21164PC.

The 9.6-million-transistor 21164 has a seven-stage integer pipeline that contains two integer units. It also includes a nine-stage FPU that can simultaneously issue an add and a multiply in one cycle. The chips have direct-mapped, 16-kbyte instruction and data caches. The write buffer holds as many as four 32-byte blocks for pending writes. The 21164 features a 96-kbyte, on-chip, Level 2, write-back cache. The chip also supports as much as 64 Mbytes of off-chip, Level 3 cache. To help increase memory-load efficiency, the 21164 contains merge logic that looks ahead to see if the processor is making more than one reference to the same cache block. The merge logic can allow as many as 20 load instructions to operate simultaneously.

Digital based the 21164PC on the 21164, minus the on-chip, Level 2 cache. It contains a Level 1, 16-kbyte write-through, direct-mapped instruction cache and an 8-kbyte, direct-mapped, write-back data cache. Its interface includes control for 512 kbytes to 4 Mbytes of off-chip Level 2 cache.

Digital will soon be sampling a third-generation design, the Alpha 21264, which will contain 64-kbyte, two-way set-associative instruction and data caches and support a system bus operating as fast as 333 MHz. The 21264 will map four instructions per cycle and execute them out of order in four integer-execution units and two floating-point pipes. Designed for speeds of 500 MHz and greater, it will perform at an estimated 30 SPECint95 and 50 SPECfp95.

Digital's FX!32 translation software enables Alpha systems to run nonported Windows (Win32) applications. During first-time application runs, Alpha emulates x86 instructions and logs executables. When a user closes an application, FX!32 transparently translates logged instructions into Alpha binaries. Subsequent runs use the Alpha binaries in place of x86 instructions, speeding execution to 50 to 70% of native Alpha performance.

Special instructions:

All instructions are 32 bits long and comprise branch, load/store, integer, and floating-point operations, as well as CALL_PAL (Privileged Architecture Library) types. CALL_PAL instructions vector to a software library that atomically performs both privileged and unprivileged functions, such as handling interrupts, exceptions, and maintaining translation-look-aside buffers. This instruction class allows Alpha to accommodate VAX-specific hardware characteristics. The 21164PC followed by the 21264 implements motion video instructions (MVI), which improve motion estimation for real-time videoconferencing and MPEG-2 decoding without hardware accelerators.

Special peripherals:

Digital provides the 21171 and 21172 core-logic chip sets for the Alpha 21164. These chip sets include an interface to a 32- or 64-bit PCI local bus and control paths and datapaths tot he µP. They also support an external third-level back cache (Bcache). The 21171 comprises the 21171-BA data switch (DSW) chip in a 208-pin PQFP and the 21171-CA control, I/O-interface, and address (CIA) chip in a 383-pin PGA package. The 21172 comprises the 21172-BA DSW in a 208-pin PQFP and the 21172-CA CIA in a 388-pin PBGA package.

VLSI offers the 21174 chip set for the Alpha 21164 and Alpha 21164PC. This single-chip includes a host (CPU)-bus-master/slave interface, a PCI-bus master/slave interface, a DRAM interface, and an interrupt subsystem. With 128-bit host and memory data buses, the 21174 supports as much as 512 Mbytes of synchronous, extended-data-out, or fast-page-mode DRAM. The 21174 comes in a 504-pin BGA package.

Development tools:

Digital offers its Alpha Motherboards software-developer's kit (SDK) to support software and firmware development. The SDK contains source code, examples, and tools that provide a starting point for developing firmware for designs based on Alpha µPs. Binary files of the Alpha SRM Console, debugging-monitor, and Windows NT firmware and the Windows NT hardware-abstraction layer for supported Alpha motherboards are also available.

Second sources:

Digital has licensed Mitsubishi (Sunnyvale, CA) and Samsung (San Jose, CA) to build and market Alpha µPs.


MIPS R5000

20VR5000The MIPS R5000, with applications ranging from arcade games to workstations, is the first low-end implementation of the MIPS IV instruction architecture. The superscalar R5000 implements an instruction-fetch, decode, execute, data-cache-read, and cache-write pipeline similar to that of the R4600/R4700. In addition, the R5000 provides a dual-issue mechanism to allow the device to issue a floating-point instruction simultaneously with any other instruction type.

The R5000 contains 32-kbyte instruction and data caches. A dual 48-entry, virtually indexed translation-look-aside buffer (TLB) also increases performance by allowing back-to-back TLB accesses. The TLB implementation is compatible with R4xx implementations to ensure compatibility with system and user software. The R5000 includes a fast multiplier for floating point, a major improvement over MIPS III CPUs. The CPU also supports an interface to a synchronous L2 cache.

Special instructions:

The device is compatible with MIPS I, III, and IV. MIPS IV supports four multiply-accumulate/subtract floating-point instructions, useful in graphics and signal processing. It also supports conditional moves to reduce branch frequency, index-address modes (register plus register), and new addressing modes for floating-point operations required by compilers optimized for higher performance, floating-point throughput.

Special on- and chip peripherals:

NKK's Big and Little Dipper chip sets support the R5000. The Big Dipper features CPU-interface control, a Level 2 cache controller, a PCI-bus bridge, an external PCI-arbiter interface, and a memory controller. The Little Dipper, targeting embedded applications, interfaces to the CPU and features ROM, DRAM, and DMA controllers; a timer/counter; an interrupt controller; and a PCI arbiter and PCI interface. Galileo Technology also provides R5000-specific support chips for PIC, memory, and Level 2 caches. NEC is rumored to be planning to provide a similar support chip.

Development tools:

A range of third-party development tools is available for the MIPS RISC architecture. Wind River (Alameda, CA), Integrated Systems (Sunnyvale, CA), Accelerated Technologies (Mobile, AL), Green Hills Software (Santa Barbara, CA), and Microsoft (Redmond, WA) provide embedded operating-system support. Algorithmics, Cygnus (Mountain View, CA), Microsoft, Green Hills Software, Metrowerks (Austin, TX), Tasking (Dedham, MA), and Wind River offer development tool chains and compilers. Hewlett-Packard (Colorado Springs, CO) and Corelis (Cerritos, CA) offer debuggers and in-circuit emulators. (You can get further information on development tools for the Rxxxx and other MIPS µPs in the MIPS RISC Resource Catalog from the MIPS Group of Silicon Graphics or at www.mips.com.)

Algorithmics, Galileo Technology, and Cogent Computer Systems (Hubbardston, MA) provide evaluation boards for the R5000. Mentor Graphics (Wilsonville, OR) offers bus-functional models, Synopsys (Mountain View, CA) offers hardware models, and Simulation Technology offers coverification tools.

IDT supports the 79RV5000 with the company's 79S465 evaluation platform with the 79S500 daughtercard. (For more information, see the MIPS R4xxx.) The 79S500 includes 2 Mbytes of Level 2 cache.

Second sources:

QED designed the R5000 and licensed it to the MIPS Group, which, in turn, licensed it to IDT, NEC, Toshiba, and NKK.


MIPS R4xxx

20R4000Applications for the R4xxx µPs range from games to laser printers to workstations. Most R4xxx processors implement the MIPS III architecture instruction set, which provides 64-bit integer registers, 64-bit instructions, and 32- or 64-bit addressing for each privilege level. The first device to implement the MIPS III architecture was the high-end R4000/R4400. To reduce code size, MIPS and LSI Logic codesigned the MIPS16 application-specific extension. MIPS16 comprises new 16-bit instructions with a corresponding decoding block that the MIPS µP core integrates. Although most applications still need to run 32-bit code (MIPS16 supports a mixture of 32- and 16-bit code), MIPS Group claims that MIPS16 provides an overall memory savings as large as 40%. LSI Logic, with its TinyRISC TR4101, is the first MIPS licensee to implement the MIPS16 instruction extensions.

The superpipelined R4400 has an eight-stage pipeline: instruction fetch (first and second halves), register file (access), execute, data cache (first and second halves), tag check, and write back. The drawback of a long pipeline becomes obvious when the processor performs branches or memory references: Branches cause a latency of three internal clocks, and loads incur a two-cycle latency. However, superpipelining increases performance because each stage can run at twice the system clock.

The R4600/R4700 and derivatives represent the low-cost, low-power end of the R4xxx product family. These CPUs include an instruction-fetch, decode, execute, data-cache-read, and cache-write pipeline. They rely on increasing clock speed to raise performance. The pipeline includes a one-processor-clock delay slot for branch-type instructions. The R4700 includes a separate FPU and performs integer multiply and divide in the FPU. (The R4700 has no integer-multiply and -divide unit.) The R4700 does virtual-to-physical-address translation in parallel with its cache accesses. The caches are virtually indexed to speed access but physically tagged for addressing. The R4700 has a 96-entry, fully associative translation-look-aside buffer (TLB). The R4700 checks the cache entry's physical tag against the TLB's physical address. A four-entry write buffer eliminates write stalls.

The R4300i, another R4xxx implementation, reduces its silicon overhead by overlapping the FPU and executing floating-point instructions in the integer ALU; this tactic simplifies the pipeline and eliminates the need for a separate floating-point datapath. To accommodate floating-point operations, the R4300i's integer unit contains 32 64-bit floating-point registers. The R4700 includes a separate FPU and performs integer multiply and divide in the FPU.

The R4700 does virtual-to-physical-address translation in parallel with its cache accesses. The caches are virtually indexed to speed access but physically tagged for addressing. The R4700 has a 96-entry, fully associative TLB. The R4700 checks the cache entry's physical tag against the TLB physical address. A four-entry write buffer eliminates write stalls.

Power management:

The R4300i's power-management features include a mode that reduces the clock rate to one-fourth of normal and a power-down mode that writes CPU state to battery-backed RAM before turning off. The CPU has an instruction micro-TLB that caches the last two TLB entries and thus minimizes power dissipation. (The main TLB need not turn on.) Other circuit-design techniques also minimize power dissipation. The R4700 provides a wait instruction that disconnects the PLL from the CPU clock.

Special instructions:

MIPS R4xxx processors implement the MIPS III instruction set. To save code space, MIPS16 is an architectural extension that implements new 16-bit instructions. MIPS III's new instructions include double-word loads, stores, shifts, and addition/subtraction. The R4xxx's on-chip FPU performs 32-bit, single-precision and 64-bit, double-precision, floating-point operations. It performs integer multiply and divide stepwise in bit pairs and single bits, respectively. The chip handles 32- and 64-bit multiplies and divides. It seamlessly uses 32-bit arithmetic results in 64-bit computations; you need not track operands and specify conversion. IDT's R4640/50 has an atomic multiply-add operation to perform multiply-accumulate operations. This instruction, which DSP algorithms use, multiplies two number and adds the product to the contents of the high and low registers.

Special on- and off-chip peripherals:

NKK supports its 4xxx products with the Big and Little Dipper chip sets. The Big Dipper features CPU-interface control, a Level 2 cache controller, a PCI-bus bridge, an external PCI-arbiter interface, and a memory controller. The Little Dipper, targeting embedded applications, interfaces to the CPU and features ROM, DRAM, and DMA controllers, a timer/counter, an interrupt controller, a PCI arbiter, and a PCI interface.

Development tools:

Cygnus (Mountain View, CA) and Tasking (Dedham, MA) provide a C compilers, linkers, debuggers, and assemblers for the R4000. You can develop on a Sun, Solaris, or Windows NT/95 platform.

LSI Logic offers a TinyRISC evaluation board, the BDMR4101, for hardware evaluation. The board includes LSI's EV4101 evaluation chip with a 16-kbyte instruction cache, and 8-kbyte data cache, a timer, and a multiply-and-divide unit. To support the MiniRISC CW4011 Core, LSI has a board that contains the LR4500 evaluation chip. The chip includes the company's SerialICE support for on-chip debugging. The company's System Verification Environment (SVE) for ASIC development is available in Verilog and VHDL. LSI Logic also provides application-specific evaluation boards, such as the Integra Board for set-top-box development and the ATMIzer II for communication-product development.

IDT's 79S465 evaluation platform supports the 79RV4700 µP and operates at 50 MHz for the system and memory interfaces. The board features 4 Mbytes of noninterleaved DRAM, expandable to 64 Mbytes; 2 Mbytes of noninterleaved and cacheable flash; and 4 Mbytes of interleaved, zero-wait-state SRAM. The addition of the 79R440 daughtercard lets you use the 79S465 board to evaluate IDT's 79RV4650 and 79RV4640 µPs. IDT also offers software tools for its R4xxx processors. (See the MIPS R3xxx section for more information.)

Second sources:

The MIPS Group of Silicon Graphics designed and licenses the R4xxx processors to IDT, NEC, LSI Logic, NKK, and Toshiba. QED designed the R4600, R4700, R4640, and R4650 MIPS III processors and licenses them to IDT, Toshiba, and NKK. LSI Logic uses the R4xxx core to develop custom products.


Sun UltraSPARC

20SUNUSPUltraSPARC-I and -II are silicon implementations of SPARC V9, a version of the scalable-processor architecture. SPARC V9 maintains upward binary compatibility with SPARC V8 and extends the architecture with support for 64-bit virtual addresses and integer data as large as 64 bits; 32 double-precision, floating-point registers (up from 16); and speculative loads, which don't take a fault if accessing an out-of-range variable. V9 also defines a hardware mechanism that uses compiler technologies that streamline the prefetching of data and instructions.

The superscalar processors have nine-stage pipelines in which the first two stages comprise the instruction fetch and decode. Sun adds three stages to the integer pipe to make it symmetrical with the floating-point pipe. This architecture simplifies pipeline synchronization and exception handling; it also eliminates the need for a floating-point queue. The CPU's pipeline encompasses two integer ALUs, five graphical FPUs, and a load/store unit. Sun also includes a 2-bit dynamic branch-prediction mechanism, which is part of its prefetch unit. As the 16-kbyte instruction cache fills, the CPU uses 2 extra bits per instruction to tag on information related to the branch prediction for that instruction.

UltraSPARC-I uses data buffers to isolate the Level 2 cache from the system bus. These buffers enable overlapping of system transactions and perform error detection and correction. The processor contains an on-chip, Level 2 cache controller, and the system bus can run at one-half to one-third the processor frequency. Sun claims that instructions and data can pass between the L2 cache and the CPU at 2.6 Gbytes/sec.

The UltraSPARC-IIi is the first member of Sun's new UltraSPARC i-Series family of integrated processors. Each functional area on the UltraSPARC-IIi maintains decentralized control, allowing activities to overlap. A decoupled prefetch and dispatch unit provides sustained performance of as many as four instructions per cycle. UltraSPARC-IIi can handle multiple outstanding memory requests--three loads and two stores vs one load or store for UltraSPARC-I. Sun offers the UltraSPARC-IIi as an individual component, on an integrated cache-based module, and on industry-standard boards.

Special instructions:

SPARC V9 adds several instructions to the V8 specification. The new instructions are conditional move, 64-bit integer multiply/divide, compare and swap, prefetch, and branch on register value instructions. UltraSPARC-I adds the visual-instruction set of graphics instructions (not in SPARC V9). These instructions provide the most common operations to support Java, networking, and graphics acceleration.

Special on- and off-chip peripherals:

The UltraSPARC-IIi features three integrated interfaces: a memory controller for industry-standard DRAM access; an Ultra Port Architecture (UPA) controller supporting access as fast as 6.4 Gbps; and a 32-bit, 66-MHz, 3.3V PCI interface.

Sun's Advanced PCI Bridge (APB) chip supports the UltraSPARC- IIi. This I/O ASIC has a 32-bit, 66-MHz primary bus and two independent 5V, 32-bit, 33-MHz secondary buses with full prefetch support. You can attach as many as four APB chips to the UltraSPARC-IIi, enabling support for as many as 32 industry-standard PCI devices. The UltraSPARC Data Buffer (UDB) isolates the processor and its external cache from the main-system data bus, allowing the interface to operate at processor speed. A system-I/O chip, the SYSIO, bridges the UPA and I/O bus. Sun also offers two bridge chips for the Sbus (U2S chip) or the PCI (U2P chip). A crossbar switch, the buffered multiplexer, helps control traffic on the system's data buses.

Development tools:

UltraSPARC support is available from Sun's popular Solaris and Java OSs as well as an array of third-party RTOSs and development tools for embedded applications. Sun offers an architectural simulator that performs cycle- and instruction-accurate simulation.

Second sources:

There are no second sources for the UltraSPARC families.


MIPS R10000

20VR10The R10000 designers set out to accomplish what some consider two opposing goals: high performance and the ability to run unmodified binaries from previous-generation R4000 processors without performance degradation. The designers achieved these goals using out-of-order execution. To support out-of-order execution, the R10000 maintains an instruction-status table to determine the instructions waiting to graduate and to put the instruction in order. In addition, the R10000 speculatively executes instructions, such as those following a predicted branch. A completed instruction may never graduate because an exception or branch could invalidate the results. For this reason, an instruction may complete, but, until graduation, its results are tentative and may be discarded.

The R10000 can simultaneously dispatch instructions to five functional units: floating-point add, floating-point multiply, and load/store units and two integer units after instructions go through a two-stage instruction fetch-and-decode unit. The pipelined integer unit comprises six stages: fetch, decode, issue, execute, cache access, and write back. Floating-point instructions use a seventh stage that attaches to the integer pipe. The execution units can complete instructions and write results out of order.

The R10000 also facilitates designing tightly coupled multiprocessing systems. To accomplish this goal, the CPU has a 64-bit cluster-bus configuration that allows direct connection of four R10000 processors. Attaching the R10000 to an external agent, or cluster coordinator, creates a cluster bus that manages the flow of data within the cluster.

Power management:

The R10000 can power down any nonbusy functional units.

Special instructions:

The R10000 processors implement the MIPS IV instruction set.

Second sources:

The MIPS Group of Silicon Graphics licenses the R10000 to NEC and Toshiba.


Attention all EDN readers with an interest in DRAM technology

Wait until you read our October 9 Cover Story.

Technical editor Brian Dipert reveals, for the first time in the public domain, detailed information about a new high-speed DRAM architecture and compares it to alternative approaches.

You'll read all about it in EDN first.

And as always, we'll tell how it affects your job.

Visit EDN Access (www.ednmag.com) beginning October 9 and discuss the issues online with Brian, representatives from various DRAM manufacturers, and fellow engineers.



| EDN Access | Feedback | Table of Contents |


Copyright © 1997 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc.