EDN Access PLEASE NOTE:
FIGURES WILL LINK
TO A PDF FILE.

April 23, 1998


EDN's 1998 DSP 32-BIT Architecture Directory


Texas Instruments TMS320C4x

09CS3205The C4x has seven internal buses and on-chip memories that help deliver single-cycle execution when walking through X and Y memories for a series of multiply-accumulate (MAC) operations. TI built the C4x around a five-port register file, and, rather than time-sharing a single bus system, the C4x features separate buses for program and two data fetches. Additionally, the C4x has a floating-point-unit multiplier, an ALU, and a barrel shifter for parallel operations. The C4x also performs fixed-point math based on a 24-bit-wide mantissa on the inputs.

A 128-word cache enables the processor to deliver single-cycle pipelined execution and still use slower external memory. (It does not use the cache with internal memory.) Key inner routines fill the cache as they run. The CPU accesses an instruction from external memory and automatically loads the instruction into cache, which is divided into four 32-word segments or lines. The CPU uses a least recently used algorithm to select the cache segment for the new instructions. You can freeze a segment in the cache by setting cache-freeze bits in the CPU-status register.

Six 8-bit independent communications ports support point-to-point communications with networks of C40s and peripherals. (The C44 has only four ports.) Each port comprises eight data pins and four handshake signals. These ports free the 31-bit local and global external-memory buses for program or data accesses to the processor's 4G-word address space (C40 only). Program and data occupy a unified address space that you can configure according to your memory requirements. The local and global buses have different memory block assignments within each memory space. I/O can also use the external buses.

A six-channel DMA subsystem with its own address and data buses moves data between the communications ports and memory without altering the CPU's sequential threads. Such data movements do not overload the DSP with servicing overhead, although some data contention for memory may slow CPU execution.

Addressing modes

The C4x supports register-direct, paged-memory-direct, register-indirect, immediate, and circular addressing to support single-sized circular buffers. The CPU applies bit-reversed operations to register-indirect addressing only.

Special instructions

The C4x performs single or block instruction, zero-overhead hardware looping (nestable block repeats but without automatic save and restore of status), standard/delayed branches, interlocked access for multiprocessing (load/store integer or floating-point value and signal interlocked), conversion of floating point to integer and vice versa, reciprocal and reciprocal square-root seed, and conversion to and from IEEE floating-point formats. It performs bit test. You can specify certain instructions to execute in parallel.

Support

Development system includes scan-based emulation via the C4x's JTAG test port. External hardware can use the JTAG port to control the processor and to set and monitor registers or memory. You can string multiple C4x chips on a JTAG circuit for parallel debugging. One processor breakpoint can halt execution in an array of C4x chips, and you can single-step them all in lock step. TI sells a C4x evaluation board with four processors that works with a number of host platforms. Software tools include a C compiler, a source-level debugger for parallel debugging, an assembler/linker, and a simulator. TI has an application library. Third-party support includes the Spox, Parallel C, Virtuoso, and Helios OSs, as well as a variety of hardware tools.


| 16-Bit | 24-Bit | Back |


Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc.