|
||
April 23, 1998EDN's 1998 DSP 16-BIT Architecture DirectoryLucent Technologies DSP16000
You can also use the DAU to direct the multipliers' outputs to a 40-bit ALU with add-compare-select (ACS) capability; a 40-bit bit-manipulation unit (BMU); or a 40-bit, three-input adder/subtracter. The ALU supports 16-, 32-, and 40-bit operands and can perform specialized compare instructions to accelerate minimum and maximum operations. In addition, these compare instructions can store trace-back bits as a side effect to support Viterbi processing. The BMU performs operations such as insert, extract, and rotate bits. The DSP16000 lacks a barrel shifter, so the BMU must perform shifts, but the shifts take more than one cycle. The separate three-input adder/subtracter allows a 40-bit addition or subtraction in parallel with another operation that uses the ALU. The DSP can simultaneously send the result of an arithmetic operation into an accumulator and into one of the multiplier's input registers. This feedback loop avoids an explicit move instruction when you use the result as an input to a subsequent multiply. The register file contains eight nonorthogonal, 40-bit accumulators, which minimizes a programmer's need to swap values between memory and registers, minimizes code size, and allows more efficient compiler implementations. The DSP-16000 can perform overflow saturation on the multiplier output and on the outputs of the three arithmetic units. Overflow saturation can also affect an accumulator value as program control transfers it to memory. The DSP16000's code and data transfers rely on a modified Harvard architecture with separate 20-bit-address and 32-bit-data buses for the instruction/coefficient (X) and data (Y) memory spaces. The on-chip X and Y memories each have a 32-bit datapath to the X and Y registers--an essential ingredient of keeping the dual MAC units fed. Although the DSP can load the 32-bit X and Y registers in parallel with execution of one or two multiply operations, you must arrange the 16-bit operands as pairs in memory. In other words, the data word at Address 0 goes into one multiplier, the data word at Address 2 goes into the other, and so on. An on-chip cache holds as many as 31 32-bit instructions. You must use this cache with a do-loop instruction. When you use a do instruction, the cache-control circuitry loads the subsequent instructions into the cache as the pipeline executes them. Once the circuitry loads the loop into cache, the core can execute the loop as many as 65,535 times without overhead. One drawback of this approach is that during the first iteration of a loop, one cycle delay occurs when the instruction accesses compete with X data access. The absence of nested looping--that is, you can perform cache-based, zero-overhead looping on only the innermost loop--further exaggerates this limitation. The core also supports as many as 127k words of on-chip RAM, 383k words of on-chip ROM, and 512k words of external memory. Addressing modes The DSP16000 supports register- and memory-direct, register-indirect, immediate, and compound addressing modes. Because the device offers no bit-reversed addressing, software must perform this task. The DSP16000 supports two concurrent circular buffers. Addressing modes are oriented toward pointer arithmetic. Special instructions To reduce code size, Lucent built the DSP16000 to support a mixture of 16- and 32-bit instructions. You can conditionally execute many instructions to avoid branch penalties because branches take three cycles. A redo instruction allows you to re-execute code that has already been loaded into the cache using the do instruction. The DSP16000s trace-back encoder accelerates Viterbi acceleration and performs mode-controlled side effects for Viterbi compare instructions. The side effects allow the DAU to storewithout overheadstate information necessary for trace-back decoding. Additionally, you can use the compare instructions for determining the least common paths for Viterbi processing. Other special instructions include rounding, negation, absolute value, and fixed arithmetic. Support The DSP16000s tool set features an architectural debugger that gives a designer a schematic view of the DSPs operations and data flow. This debugger allows you to view data flow through the DSPs multiple processing units while stepping through instruction execution. Lucents goal in using this type of debugger is to allow you to view underused parts of the architecture. A second tool allows you to see how hardware components and their associated software will interact before you fabricate a pc board based on the actual components. This tool also facilitates the design of custom integrated DSP chips. Lucent also has a system-level debugging tool that is analogous to an on-chip logic analyzer. The DSP16210 chip integrates this tool, which allows you to examine the DSPs operations in real time. The price of the software tools, including a simulator, a compiler, an assembler, a linker, and a debugger, is $2000. The price of the hardware tools, including the in-circuit emulator and development board, is $5000 to $7000. |
||
| Back | |
||
| Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc. | ||