|
||
April 23, 1998EDN's 1998 DSP 24-BIT Architecture DirectoryButterfly BDSP9124/9320
The BDSP9124's quad-port architecture includes two bidirectional data ports, a bidirectional acquisition port, and a bidirectional coefficient port. Its 24-bit-wide, multiport data-flow structure eliminates the need for external data multiplexing. This structure also allows single-port asynchronous or synchronous memories to serve each bus. This design's bidirectional nature is conducive to the development of recursive, single-processor systems that process algorithms by passing the data through the chip several times. With six onboard butterfly units and two 60-bit accumulators, the BDSP9124 architecture differs from single-multiply, accumulator-based DSPs. When performing the high-level instructions, each BDSP9124 moves more than 10 Gbps through its I/O port. The BDSP9320's memory-management unit provides more than 150 memory-address sequences and system synchronizations. With 20 address bits, the BDSP-9320 directly addresses 1M word of memory, permitting very large arrays, 2-D arrays, or support for as many as 32 independent channels. The chip uses a circular-buffer technique with pointers for multiple-channel processing. The BDSP9124/9320 supports cascaded; single-instruction, multiple-data; and parallel-processing structures. Two cascaded chips process a complex input stream twice as fast as one. Five chips cascaded perform a 1 million-point FFT at a sustained 50-MHz complex sample rate. The architecture is memory-latency-insensitive. Addressing modes The 9320 generates address sequences that you would use to access data memory only for DSP-type algorithms. Additionally, the chip provides 9320 addressing sequences to allow access of data in sequential order for block-data operations. Special instructions The architecture minimizes software programming by embedding 26 macro DSP instructions in the silicon. These macros include real and complex FIRj-filter and radix-2, -4, and -16 operations. The instructions use the parallelism inherent in DSP algorithms. This capability allows a 50-MHz BDSP9124 to perform radix-16 butterfly operations in 320 nsec and a 1k-point, complex, 24-bit FFT in 65 µsec. Three cascaded BDSP9124s perform the same operation in 21 µsec. Support Butterfly DSP provides PC-based software simulators and evaluation boards. The RTS9124 cycle-accurately simulates the BDSP9124's execution unit, including instruction set, data flow, block-floating-point arithmetic, and I/O data format; the RTS9124 sells for $1500. Butterfly also offers its RTS9320 simulator for the accompanying BDSP9320. You can use the simulator to generate the required addressing sequences to support the RTS9124; it also sells for $1500. For $4900, you can buy Butterfly's simulation accelerator and target-system-development board. The company also provides C compilers. Motorola DSP5600x
Like most other DSPs, the DSP56000 has a versatile external memory bus, standard bit-manipulation capabilities, and the ability to execute directly from external memory with single-instruction-cycle accesses. The chip has no on-chip program ROM, except for a small boot ROM on some versions. However, the DSP56000 can access external memory each instruction cycle without time penalties. In the traditional sense, the DSP56000 is an accumulator-based machine because all math and logic operations go through the accumulator. However, the architecture does allow bit manipulation on registers and memory. It has a single-cycle MAC unit, but the unit has two 56-bit accumulators (8 guard bits); two sets of two 24-bit registers feed the unit. Before you use the data, you must load it into the MAC registers; however, the MAC takes only one cycle (two clocks) for a multiply and an accumulate. Other registers include control and addressing registers. The memory-mapped control registers are discrete but are addressed by memory location. Like many other DSPs, the DSP56000 has two identical address generators that automatically access X and Y memories for MAC cycles. Each address generator has a 56-bit ALU and four sets of three registers: Four pointer registers each have an associated offset and modifier register. The modifier registers can specify the type of address-register arithmetic operations, or they can hold data. The modifier registers support a FIFO buffer and bit-reversed addressing. The processor combines 16-bit addressing with 24-bit words. It has three internal address- and data-bus pairs that allow an instruction fetch and two data accesses in one cycle, thereby avoiding the need for an on-chip cache. A fourth bus, the global data bus, is a simple 24-bit logic bus that transfers data to and from on-chip peripherals. You can switch any of the internal address and data buses into the external 16-bit address and 24-bit data bus; external devices can access internal memory via a bus request to the DSP. When the 56000 stores 56-bit values to 24-bit memory or registers, you can deploy an optional 1-bit shift operation and saturate the value to ±1. Unlike many other DSPs, the DSP56000's X and Y memories have their own address spaces, which include on-chip RAM and ROM for the bottom addresses. An internal bus-switch unit handles transfers between internal buses and the single external bus. The bit-manipulation unit performs bit operations on memory values and address, control, and data registers. Addressing modes The 56000 supports register-direct, memory-direct, register-indirect, immediate, and bit-reversed addressing. Special instructions The 56000 performs do/end-do, single- or block-instruction hardware looping, bit manipulation, compare, divide iteration, jump if bit clear/set, conditional jump to subroutine, and move program memory. It performs logic operations only on bits 24 through 47 of the accumulator; these bits represent the most significant part of the data. Support Motorola offers several low-cost DSP5600x evaluation boards as well as a 40-MHz application-development system. Third-party hardware tools are also available. The 56000 uses a proprietary debug interface, OnCE, in lieu of the standard JTAG interface. Motorola supplies a Gnu C compiler and debugger, an assembler/linker, and a simulator. Third-party vendors supply data-acquisition and filter-design packages, as well as OS software. Motorola DSP563xx
When the processor executes a single-cycle multiply-accumulate (MAC) operation, the first execution stage does the multiply, and the second stage does the accumulate. The register-based architecture of the 563xx uses an interlocking mechanism that automatically inserts a no-operation (NOP) instruction into the pipe to avoid stalls. This approach permits execution to "catch up" with data dependencies. The 563xx is binary-code-compatible with the 56000, but the 563xx also supports addressing modes that include address-register program-counter (PC) relative. This mode is useful for multitasking and position-independent code, which lets a programmer deliver and relocate object modules without relinking to the original code. Motorola expanded addressing on the 563xx to the full 24 bits, up from 16 bits on the 56000 family. Unlike the DSP56000, which has a 16-location stack limit, the DSP563xx implements an overflow mechanism for the on-chip hardware stack to off-chip data memory. Although the mechanism prevents unrecoverable stack overflows, the chip takes a two-clock penalty when externally dumping stack entries. The 563xx core integrates a six-channel DMA that operates concurrently with the core's execution units and has separate address and data buses. The DMA transfers data among memories (P, X, and Y) or among memory and peripherals or the external host buses (PCI or ISA). You can convert the device's flexible program RAM to a mixture of program RAM and a 1024×24-bit, eight-way, fully associative instruction cache that you can lock at the "way" level. The instruction cache is useful for large programs that require partial storage in external memory. The cache uses a least recently used sector-replacement algorithm. The DSP runs at 3.3V but has 5V-tolerant I/O. The static core operates from dc to 80 MHz and uses a PLL with a built-in prescaler that allows dynamic clock throttling. For additional power savings, the core automatically powers down unused memories, peripherals, and core logic on every instruction. Addressing modes The 563xx supports register-direct, address-register-indirect, PC-relative, immediate, and absolute addressing. Special instructions The 563xx's barrel shifter supports multibit-shift instructions in both directions and by any number of bits. The shifter also supports instructions for bit-stream parsing and generation. The device can conditionally execute all ALU instructions, including zero, negative, and overflow. If any instruction is false, the processor executes an NOP instruction. The 563xx performs 16-bit arithmetic that is useful for handling various compression algorithms, such as LD-CELP (low-delay code-excited linear prediction). Normally, when using a 24-bit architecture for 16-bit arithmetic, performance degrades, because you have to round the 24-bit numbers in software. Support Motorola backs the 563xx family with a host of development tools. You can use an application-development system, the DSP5630ADS, to evaluate the chip and debug target systems. The device comes with an assembler, simulator software, and a C compiler. The 563xx's JTAG-based OnCE port allows you to examine all internal buses in real time and record the last 12 change-of-flow instructions. Domain Technologies (www.domaintec.com) and Sonitech (www.sonitech.com) both offer PC-based emulators that use the DSP-563xx's OnCE port. Momentum Data Systems (www.mds.com) and Spectrum (www.spectrumsignal.com) offer 56301-based boards with a PCI-bus interface. |
||
| Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc. | ||