|
||
April 23, 1998EDN's 1998 DSP 16-BIT Architecture DirectoryTexas Instruments TMS320C27xx
The CPU's instruction pipeline comprises eight independent phases, implying that eight instructions can be active at once. During the first phase, the CPU fetches either one 32-bit instruction or two 16-bit instructions from program memory. Because not all reads and writes happen in the same phases of the pipeline, TI engineers designed in an atomic read-modify-write capability. This capability is a pipeline-protection mechanism that stalls instructions as needed to ensure that reads and writes to the same location happen in the correct order. This protection mechanism also applies to register conflicts. The control logic pumps in inactive cycles between instructions that would cause the conflicts. You can reduce these types of pipeline-protection cycles if you insert other instructions in your program between the instructions that conflict. The main functional units attached to the pipeline include the program-control logic, program-address-generation logic with a 22-bit program counter, an address-register arithmetic unit, the 32-bit ALU, a 16×16-bit single-cycle multiplier and multiply-accumulate (MAC) unit, and a 32-bit barrel shifter. The program-control logic stores and decodes a queue of instructions that the fetch phase of the pipeline fetched from program memory. The instruction queue comprises enough registers to hold four 32-bit instructions or eight 16-bit instructions. The C27xx supports a mixture of 16- and 32-bit instructions, but most instructions are 16 bits. The ALU allows you to save results in a register or directly in data memory. The multiplier produces a 32-bit result, but the C27xx supports no guard bits, so it saturates any result exceeding the accumulator size. However, 6 overflow counter bits enable the accumulator to support operations that require as much as 38 bits of dynamic range. The barrel shifter accepts a 16- or 32-bit value, but it can shift only as many as 16 bits per cycle. The C27xx implements a modified Harvard architecture and uses separate 32-bit on-chip read and write buses that can each fetch or store 2 words per cycle. It also has a 32-bit program bus. The C27xx supports a software stack and has a dedicated 16-bit stack pointer; the stack must reside within the low 64 kbytes of data space. The CPU automatically saves all registers on the stack during an interrupt; a full context switch takes place in 160 nsec. However, the C27xx can complete a minimal context switch in 80 nsec. The basic MAC cycle involves multiplying a data-memory value by a second value in the T register. The MAC instruction automatically loads the T register from program memory. When the repeat instruction precedes a MAC instruction, the C27xx executes the MAC instruction until the repeat count decrements to zero, ignoring all interrupts. Addressing modesThe C27xx supports immediate, register, and direct addressing. It also supports register-indirect addressing using the 16 bits in one of eight auxiliary registers to access memory. The C27xx has a single data-page point that you can use for direct-addressing modes when the CPU addresses data memory in blocks of 64 words. The indirect-addressing mode supports a circular buffer. The C27xx does not support bit-reversed addressing. Special instructionsThe C27xx offers single-instruction repeat but does not support zero-overhead looping. Other special instructions include multiply and accumulate previous product, multiply and subtract previous product, and multiconditional branches. A variety of shift and rotate instructions supports the barrel shifter. Instructions allow and disallow the JTAG emulator to access the on-chip emulation registers. SupportTI offers the usual suite of development tools, including a C compiler, an instruction-set simulator, a debugger, evaluation modules, and simulation models for ASIC design. TI also provides a code-translation tool that allows you to translate C2xx into C27xx code. The C27xx contains a JTAG-emulation module that is new to TIs DSPs. On-chip emulation logic supports a debug-and-test DMA unit, a data-logging utility, a 32-bit counter for performance benchmarking, multiple debugging events, real-time mode operation, and interrupt-generation capabilities. The DMA unit allows the emulator to access registers and memory during unused cycles of instruction pipeline without CPU intervention. Using the real-time operating mode, you can halt the main body of code (that is, when the code reaches a breakpoint), but the CPU still services interrupts. The CPU can break within one interrupt and still service others. TI combined this JTAG-emulation module with its real-time data-exchange (RTDX) capability. RTDX allows the CPU to export data at bandwidth as high as 300 kbytes/sec, although initial implementations approach only 8 kbytes/sec. In a motor-control application, for example, RTDX allows you to modify registers and instructions, stream data variables, or set and execute hardware breakpoints without affecting the motors operation. RTDX communicates between the host computer and the DSP target using an emulator and a procedural library; this internal data-exchange library uses the scan-based emulator to move data on- and off-chip via the JTAG interface. |
||
| Back | |
||
| Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc. | ||