EDN Access PLEASE NOTE:
FIGURES WILL LINK
TO A PDF FILE.

April 23, 1998


EDN's 1998 DSP 16-BIT Architecture Directory


Zilog Z894xx

09CS1621The accumulator-based Z894xx, originating from the Clarkspur core, provides an upward migration path for the Z893xx. Although the code is not binary-compatible, the Z894xx supports most of the Z893xx's instructions. The Z894xx has a four-stage pipeline that delivers single-cycle multiplies and pipelined multiply-accumulate (MAC) instructions. The hardware multiplier performs a 16×16-bit to 32-bit multiply and transfers the result to the 32-bit ALU (with 8 guard bits for the MAC) or reiterates the multiplication. The address pointers can simultaneously address the two data RAMs for loading data into the multiplier.

Zilog's Z894xx contains a bit-field unit (BFU) with a 32-bit barrel shifter that can manipulate 16- or 32-bit values. The shifter can shift or rotate a 32-bit operand left or right and place the result in the accumulator. In addition, the BFU can extract a source-bit field and mask and merge it with the specified destination contents.

The DSP implements a Harvard architecture, providing independent program- and data-memory spaces that the DSP accesses simultaneously through X and Y buses in parallel operations. The chip contains an internal-data (ID) bus and a multiplier-product (P) bus. The ID bus provides access to RAM, the stack, the program counter, the RAM pointer, and the data-address space. The 32-bit P bus provides access to the ALU, accumulator, multiplier outputs, and BFU. You can treat a 32-bit product register as two 16-bit registers. External interfaces include separate address and data buses for simultaneous access of external program and data memory.

The Z894xx provides three 12-bit register pointers for each RAM bank. The chip can automatically increment or decrement these pointers to implement circular buffers without software overhead. The Z894xx implements the same type of codec that the Z893xx devices include.

Addressing modes

The Z894xx supports register, direct, indirect, indirect-with-bit-reversal (useful for some FFT algorithms), and immediate addressing.

Special instructions

The Z894xx performs conditional execution of certain instructions, as well as conditional branching. Unlike the Z893xx, the Z894xx performs repeat (hardware looping) and bit test and manipulation. Instructions can zero all bits in the flag except one of interest and store that value into the accumulator. You can also merge flags into the accumulator without overwriting previous bits.

Support

Zilog offers an emulator, an assembler, a linker, a C compiler, a simulator/debugger, and an evaluation board. Zilog also offers prototype packs to accommodate packaging options.


ZSP 16400 DSP

09CS1625ZSP's 16400 is a superscalar, RISC-like architecture that issues as many as four instructions per cycle. Its most prominent features are the dual multiply-accumulate (MAC) units and dual ALUs. The 16400 performs two 16-bit MACs or one 32-bit MAC per cycle. Results flow into a 40-bit accumulator. This "C-friendly" architecture implements an "almost-orthogonal" instruction set, flexible addressing modes, software-stack support, and a larger register file than traditional DSPs.

The core of the 16400 comprises a five-stage pipeline: fetch and decode, instruction group, read, execute operations, and write results into register file. One processing unit handles instruction scheduling, making it easier for ZSP to develop custom instructions without affecting the datapath. The pipeline-control unit performs instruction grouping and resolves data and resource dependencies. It then dispatches the instructions to the individual functional units, which operate in parallel. Pipeline control also performs result bypassing and interrupt processing. Result bypassing moves results from functional units directly back into the pipeline without going through the write-back stage. The 16400 provides multitasking support and uses a six-level interrupt structure with programmable priority levels; a high-priority event can interrupt all instructions.

The 16400's functional units share a register file of 16 16-bit registers. The register file serves as a source and destination for MAC operands. A 32-word instruction cache and a 32-word data cache are standard features. Looping hardware loads instructions into the cache, and, if the loop exceeds the cache size, the prefetcher grabs code to get it ready to load into the cache. Separate 64-bit instruction and data buses feed these caches from a dual-port memory. With the dual-port SRAM, the device reads and writes memory during one clock cycle. If the processor is processing a loop that is bigger than the 32-word cache, the 16400's prefetcher keeps the cache filled before the instruction execution.

The 16400 uses static branch prediction; you should try to design your code so that program flow takes all backward predictions. A mispredicted branch incurs a three- to five-cycle latency; this latency is only three cycles when the target instructions are still in the cache. The DSP core contains an integrated non-cycle-stealing DMA unit. This unit uses a dedicated 8-kbyte portion of the dual-ported SRAM, has its own buses, and operates independently of the core-processor operations. Two 16-bit timers are also a standard feature of the DSP core.

ZSP's DSP has hardware support for two nested looping constructs. It also supports eight 32-bit or 16 16-bit barrel shifters. This support is possible because the DSP performs a single-cycle shift as large as 16 bits on each of the core's 16 registers. You can also concatenate two 16-bit registers and perform single-cycle, 32-bit shifts on that register combination.

Addressing modes

The ZSP 16400 performs bit-reversed addressing in software using a bit-reversing structure; an instruction flips the bits because there is no addressing hardware. The 16400 has no dedicated circular buffer hardware; you can implement two circular buffers by using the data cache with two of the core's registers. The DSP has hardware support for indexed (immediate or register content), indirect, and register-to-register addressing modes.

Special instructions

The 16400 has add-compare-select and parallel-add and -subtract instructions for FFT and Viterbi; single-cycle bit-manipulation instructions; and specialized load instructions, which free programmers from the complexities of detailed pipeline management. Zero-overhead looping requires an "again" instruction to indicate the end of the loop and to perform the counter decrement.

Support

ZSP offers a functional compiler; an optimized version should be available in July. The compiler supports new C fixed-point data types and employs a variety of C-intrinsic functions. The company also offers an assembler, a linker, a debugger, a profiler, and a hardware-evaluation board. ZSP has emulation tools based on the JTAG interface and a $5000 evaluation board.


| Back | 24-Bit | 32-Bit |


Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc.