|
||
April 23, 1998EDN's 1998 DSP 16-BIT Architecture DirectoryTexas Instruments TMS320C6x
The eight functional units on the C62x comprise two 16×16-bit multipliers and six 32-bit arithmetic units with a 40-bit ALU and a 40-bit barrel shifter. Each functional-unit set has its own bank of 16 32-bit registers but can access the other functional-unit set's register bank; the functional-unit set performs this procedure through one data bus. Register access across the CPU supports only one read and one write operation per cycle. However, each functional-unit set can perform as many as four reads per cycle from a register in its own bank. You can also issue multiple writes to a register on the same instruction cycle, as long as the instructions have different latencies. TI created the C67x, the floating-point implementation of the C6x, by adding floating-point capability to six of the eight functional units. Therefore, the C67x instruction set is a superset of the C62x fixed-point instruction set: All C62x instructions run unmodified on the C67x CPU. Unlike most DSPs, the C6x does not support separate X- and Y-memory spaces. Instead, it provides a single data memory with two 32-bit paths for loading data from memory to the register banks. Two other 32-bit paths store register values to memory. A 32-bit address bus supports these datapaths. A 32-bit address bus also addresses the program memory, but the single datapath is 256 bits wide. This width allows the CPU to fetch, but not necessarily execute, eight 32-bit instructions per cycle. TI calls this approach a "fetch packet." Keeping all eight functional units busy is the key to squeezing the highest performance from the C6x. In reality, data dependencies, instruction latencies, and resource conflicts limit optimal performance. Therefore, the CPU can execute one to eight instructions per cycle. Multiple execute packets allow fully parallel, fully serial, or parallel/serial combinations; therefore, eight serial instructions require the same code size as eight parallel instructions. The compiler and assembly optimizer play a big role in establishing the sequence of instructions for the C6x to execute. The programming tools link instructions in a fetch packet by the least significant bit of an instruction. If the bit is set, the C6x executes the instruction in parallel with the subsequent instruction. The assembly optimizer is responsible for performing dependency checking and parallelism among instructions. Therefore, the code executes as programmed on independent functional units and eliminates the need for core features, such as out-of-order execution or dependency-checking hardware. The C6x lacks a dedicated multiply-accumulate (MAC) unit. Instead, it performs MAC operations by using separate multiply and add instructions. Although this requires two instruction cycles, the pipelined effect yields apparent single-cycle execution. Using this design approach, TI engineers simplified the C6x's functional units, which, in turn, allows the core to operate at 200 MHz. Addressing modes The C6x performs linear and circular addressing. However, unlike most DSPs that have dedicated address-generation units, the C6x calculates addresses using one or more of its functional units. Special instructions The processor conditionally executes all instructions, a method for reducing branching and, therefore, keeping the 11-stage pipeline flowing. Support The development tools for PC ($2995) and Sun ($4995) host platforms include a C6x C compiler, an assembly optimizer, a simulator, a linker, and a debugger. TI also offers a hardware-emulation board that is compatible with the companys XDS510JTAG emulator interface. The assembly optimizer simplifies assembly-language programming and automatically schedules and parallelizes instructions from serial, inline assembly code. The assembler reads straight line code without regard to registers or functional units and does the resource assignment. Deterministic operation allows the debugger to lock-step through the code. The debugger performs code profiling to determine the amount of time the processor spends in various portions of the code. |
||
| Back | |
||
| Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc. | ||