|
||||||||
|
||||||||
![]() Hitachi SuperH Series |
|
View block
|
|
The SuperH Series comprises the SH-1, SH-2, SH-3, and SH-4 series of RISC µPs, µCs, and ASIC cores. The SH-1, -2, and -3 employ a fetch, decode, execute, memory-access, and write-back-to-register pipeline. Hitachi built the devices around 25 32-bit registers that you access using load/store instructions. These registers comprise 16 general registers (the SH-3 has eight 32-bit shadow registers for context switching), five control registers, and four system registers. Depending on the chip, the interrupt latency can be as low as seven clock cycles. The chips use 32-bit datapaths to internally move data, but all versions use a flexible external bus width. The SuperH family also has devices with single-cycle mask ROM and one-time-programmable and flash memory with densities as high as 256 kbytes, unlike most RISC families.
Although devices in the SH series have a similar core, significant differences exist. The major differences between SH-1 and SH-2 are that the SH-2 features on-chip cache memory, higher speeds, and a 32X32-bit multiply-accumulate (MAC) unit. (The SH-1's MAC unit is 16X16 bits.) To build the SH-3, Hitachi added to the SH-2 a memory-management unit (MMU), a barrel shifter, and the ability for conditional-branch instructions to enable or disable the pipeline's delay slot. Disabling the delay slot, although decreasing perform-ance, allows the processor to run more deterministically and reduces the effects of pipeline flushes.
The 200-MHz, two-way-superscalar SH-4 µP includes a 3-D graphics accelerator that Hitachi claims can perform at 1.2 Gflops. This µP has four 32X32-bit multipliers fed by two 128-bit buses; it also has four adders. You can load the multipliers with eight operands in one cycle; the µP then adds the results in the next cycle. This hardware performs rotations and transformations on 32-bit, single-precision, floating-point vectors.
SuperH processors use a 16-bit instruction word to achieve compact code. The instruction width limits the number of basic operation codes, handles only 16 general registers, and addresses only two operands. Additionally, only 12 bits are available for an immediate offset; jumps with immediate data must be in 2048-byte hops. However, the SH-3 supports FAR-relative branches to support position-independent code. Although these restrictions lead to more instructions per task, the overall result is significantly smaller code.
The SH-1 µPs can operate from external memory or from on-chip program memory at a CPU frequency of 20 MHz. The 16-bit-wide external-memory bus can supply the CPU with instructions from SRAM or fast DRAM on each cycle. If the processor is operating from external memory, each data access to external memory may take an additional one to two cycles.
Instead of on-chip program memory, the SH-2 and SH-3 have a four-way, set-associative on-chip cache (4 kbytes for the SH-2 and 8 kbytes for the SH-3), a 32-bit-wide memory bus for CPU-memory bandwidth as high as 60 MHz with a synchronous-DRAM interface), and a 32-bit divide unit (replacing the first chip's bit-step-divide function on the SH-2). You can reconfigure the cache as a two-way, set-associative cache and 2 kbytes (SH-2) or 4 kbytes (SH-3) of user-configurable RAM. The external-memory bus supports multiprocessing; it has bus arbitration for multiple masters. The SH-3 also has a unique RTOS feature: If a task or thread crashes, the operating system can gracefully recover and not have the errant task corrupt other tasks or RTOS environments.
Power management: Sleep mode discontinues CPU processing but keeps peripherals active. Standby stops everything but maintains register and cache contents. The SH-2 and -3 provide several clock modes for reducing power; software can adjust the clock rate during program operation. The SH-3's unified cache has a special low-power design that dissipates only 100 mW in operation. The cache sense amps are energized for the cache set that hits while the other three sets stay switched off. The sense amps respond to only a 60-mV differential versus the full 3.3V swing.
Special instructions: A 16X16-bit MAC instruction (42-bit accumulator) in the SH-1 and a 32X32-bit MAC instruction (64-bit accumulator) in the SH-2 and SH-3 provide a fast DSP function. Although Hitachi classifies the architecture as load/store, some instructions reference memory. Delayed branch instructions minimize pipeline disruption. An instruction swaps upper and lower bytes. The SH-4 includes a set of 3-D, floating-point instructions. The SH-DSP, a version of the SH-2, supports 23 32-bit DSP instructions for zero-overhead looping and modulo-addressing support.
Special on-chip peripherals: The SH-DSP contains a DSP as an "on-chip peripheral." This DSP unit shares the five-stage pipeline with the integer unit; the DSP is not a coprocessor. The CPU contains a fetch-and-decode unit, which manages the instruction stream for both the integer and DSP units, routing instructions to the appropriate unit (see EDN's 1998 "DSP-architecture directory," April 23, 1998, pg 54). Other, more conventional peripherals include memory controllers, a real-time clock, smart-card and serial codec interfaces, IrDA support, a floating-point-unit coprocessor, a hardware division unit, complex multifunction timers, a PCMCIA interface, and an LCD controller.
The SH-3 contains an MMU with a 128-entry translation-look-aside buffer (TLB). The TLB caches virtual-to-physical-address translations from user-created page tables to external memory, providing both data protection and virtual memory. Address translation employs a paging system that supports 1- or 4-kbyte pages. The MMU also handles multitasking by providing multiple virtual-memory modes. Thus, each process has its own virtual memory and cannot access the resources of another process or the OS kernel.
Development tools: Hitachi and a number of third-party vendors offer development-tool support for the SuperH. Hitachi, Green Hills Software (www.ghs.com), and Cygnus (www.cygnus.com) provide C and C++ compilers. Hitachi, HP (www.hp.com), Orion Instruments (www.yokogawa.com), and Sophia Systems (www.sophia.com) offer in-circuit emulators. Wind River (www.windriver.com), Accelerated Technology Inc (www.atinucleus.com), and Microsoft (www.microsoft.com) provide RTOSs. Other tools include assemblers, ROM emulators, integrated Windows-based development environments, debuggers, floating-point libraries, and networking libraries. Hitachi supports Windows CE development with the $10,000 D9000, a reconfigurable development platform.
Second sources: Seiko-Epson (www.seiko.com), VLSI, STMicroelectronics, and Sony (www.sel.sony.com) are licensees.
| Hitachi SH-DSP |
|
| EDN Access | Feedback | Table of Contents |
Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc.