Feature
2002 DSP directory (continued)
By Robert Cravotta, Technical Editor -- EDN, 4/4/2002
Click here to return to part 1 of the DSP directory.Equator Technologies' MAP-BSP family
at a glance:
- The MAP-BSP-15-400 can perform 40 GOPS at 400 MHz.
- MAP-BSP devices are entirely programmable in C.
Equator's MAP-BSP family of VLIW/SIMD (very-long-instruction-word/single-instruction-multiple-data), merged-video DSPs and microprocessor devices target videocentric applications. All of the MAP-BSP devices are pin-compatible and can operate on 8-, 16-, 32-, 64-, and 128-bit-wide data. MAP-BSP devices integrate the processor with SDRAM and PCI interfaces and a flexible multimedia-I/O system that supports a single external bank of DRAM. Advanced features include compiler-managed register sets, task-optimized functional units, throughput-optimized data cache, and support for digital RGB.
Addressing modes: MAP-BSP devices support direct, indirect, and virtual addressing modes.
Special instructions or integral-peripheral functions: The MAP-BSP instruction set includes operations geared toward FIR and FFT functions. Interface support includes two ITU-656 video-in ports, two TCI-In transport-stream-in ports, a TU-656 video-out port, and in/out general-purpose data ports. The hardware includes a variable-length encoder/decoder, used in MPEG-1, -2, and -4 encoding/decoding, and polyphase filtering for video scaling.
Support: Equator's iMMediaTools suite supports development for all of the MAP-BSP devices and includes a parallelizing C compiler, a linker, a source-level debugger, simulators, and standard libraries.
Hitachi Semiconductor's SH-DSP and SH3-DSP
at a glance:
- The SH7622 can perform 87 MIPS at 60 MHz.
- The SH7727 can perform 208 MIPS at 160 MHz.
Processors in the SH-DSP series (SH7615, SH7616, SH7622, and SH7065) combine a 32-bit RISC CPU and a 16-bit integer DSP unit into a single core. The DSP unit can execute single-cycle 16×16-integer multiplies and can multitask its operations. Hitachi's SH7616 is a CMOS single-chip microcontroller that integrates a 10/100-Mbps Ethernet controller supported by two 2-kbyte FIFOs and a multichannel DMA controller targeting Ethernet applications, such as network video/printers, network terminals, and management processors. The SH7065 integrates 256 kbytes of on-chip flash.
Processors in the SH3-DSP series (SH7727, SH7729) combine a 32-bit RISC CPU and 16-bit integer DSP unit into a multitasking core with a four-bus structure targeting Web/Smartphone, handheld PCs, Internet terminal/IP fax, digital still cameras, and security-terminal applications. SH3-DSP devices include 16 kbytes of X/Y RAM, 16 kbytes of cache (Ways 2 and 3 lockable), a bus-state controller for glueless connection to SDRAM, and on-chip JTAG and real-time-instruction trace-debugging modules. The SH7729R includes data protection and virtual memory.
Addressing modes: Devices support direct- and indirect-register, predecrement or postincrement indirect-register, indirect-register-with-displacement, indirect-indexed-register, indirect-global-base-register-with-displacement, indirect-indexed-global-base-register, indirect-program-counter-with-displacement, and program-counter-relative immediate addressing.
Special instructions or integral-peripheral functions: The SH-DSP and SH3-DSP use a 16- and 32-bit instruction set that supports one-cycle multiplication/addition, operand-unrelated parallel moves, conditional execution for DSP datapath instructions, multiprecision arithmetic in microcontroller instructions, and single-cycle exponent detection (DSP operations are all 32-bit instructions.). The SH7622 SH-DSP core device includes high-speed on-chip USB.
SH3-DSP devices include a memory-management unit, a timer, a real-time clock, an interrupt controller, and a serial-communication interface. The SH7727 includes USB host and LCD controllers that support bus-master functions. The SH7729R includes an infrared communication, ADC, DAC, and power management.
Support: Hitachi and third parties offer evaluation kits, emulators, companion chips, reference-design platforms, software board support, RTOSs, middleware, and application packages. Hitachi offers middleware for the SH-DSP and SH3-DSP covering telephony applications, including G.729, G.725 and G.723.
Improv Systems' Jazz
at a glance:
- Processor-to-processor data communication uses direct on-chip data memories.
- All processors attach to a single Q-bus that enables queuing of tasks.
The Jazz DSP, a configurable VLIW (very-long-instruction-word) processor architecture, incorporates features such as overlaid datapaths, a distributed-register system, code compression, and power management. The Jazz PSA (programmable system architecture) is customizable to provide accelerated execution of key application algorithms. The flexible DSP-core architecture facilitates design modifications without compromising the verification integrity and processor tool chain. It can scale from a single uniquely configured Jazz DSP processor core to a system-level platform implementation that consists of many processors in an interconnected structure.
Addressing modes: Supported addressing includes direct, indirect, indexed, immediate, displacement, bit reverse, bit-reverse index, vector index, and post-increment. A wrap mode provides support for circular buffers.
Special instructions or integral-peripheral functions: Special instructions support single-cycle built-in library functions for common signal-processing data transforms. This platform architecture supports the implementation of multiple processors, nonvolatile instruction memory, configurable I/O interfaces, and hardware support for u-tasking. Special task-control instructions provide support for the unique u-task scheduling in the Jazz PSA.
Support: The Jazz Tool Suite uses a graphical design environment to support unique Jazz DSP processor-configuration development and includes an integrated development environment, a compiler, an assembler, an instruction-set simulator, a profiler, a debugger, and FPGA-emulation support. Improv's Rehearsal boards provide a near-real-time system for designers to run configurations of the Jazz PSA core to verify designer-defined DSPs and to run with other elements of the overall system. Improv's application-oriented platform-solution kits contain a collection of hardware and software components, such as custom Jazz DSPs, application software, and reference designs. Acappella is a family of application-optimized hardware/software for the voice-over-packet market.
Infineon Technologies' Carmel DSP 10xx and 20xx core
at a glance:
- The 10xx core can achieve 2800 MOPS with a single-cycle instruction rate at 200 MHz.
- The 20xx enables user-defined execution units for highest efficiency.
The 10xx 16-bit, fixed-point Carmel DSP core combines a predefined instruction set and CLIW (configurable-long-instruction-word) technology. The 10xx core targets wireless, wired, and consumer applications. User-defined CLIW instructions execute in one clock cycle without wait states or setup overhead, just like any predefined instruction, allowing software code to switch between execution units on a clock-by-clock basis. This ability is particularly useful for reducing the cycle count of inner loops, increasing the overall application code performance by a factor of two or more over software implementations that use only the predefined instruction set.
The second-generation, reconfigurable, 16-bit, fixed-point Carmel DSP 20xx core uses CLIW technology and adds PowerPlug technology to design optimized execution units for specific algorithms. The 20xx core targets wireless, broadband, multimedia, and consumer-electronics applications. The core includes six standard arithmetic units and accommodates as many as four additional PowerPlug modules to accelerate computationally intensive functions, such as MAC, Viterbi, image, and video. The combined effect is an increase in flexibility and higher efficiency than using predefined, general-purpose instruction sets.
Addressing modes: The Carmel DSP 10xx and 20xx cores support immediate address in the instruction, direct reference to operand registers, and indirect reference to an operand data memory. Addresses may be 16 or 32 bits, with linear, bit-reverse, and modulo (aligned and nonaligned) address modification, each with increment, decrement, and offset modifications.
Special instructions or integral-peripheral functions: CLIW technology allows user definition of 96-bit instructions, composed of as many as six individual parallel subinstructions, to increase the efficiency of the core in tight DSP loops. The 20xx core can also include as many as four PowerPlug modules, which are user-defined execution units targeting specific applications that seamlessly integrate into the regular tool chain. The core includes built-in support for Viterbi decoding as well as for minimum/maximum searches. The Carmel DSP is inherently modular, with extensive libraries of synthesizable system peripherals and memories, as well as software functions. You can integrate peripheral functions into the core via the Infineon Flexible Peripheral Interconnect bus.
Support: Third-party partners provide hardware- and software-development tools to aid in system-design and application-software development. Infineon also provides development-chip implementations of the core with large on-chip memory and multiple I/O options, as well as evaluation boards and hardware/software cosimulation models.
LSI Logic's LSI402ZX and LSI403LP
at a glance:
- The LSI402ZX can perform 800 MIPS at 200 MHz.
- The LSI403LP can perform 600 MIPS at 150 MHz.
The high-performance, 16-bit, fixed-point LSI402ZX DSP is based on the LSI Logic ZSP400 DSP core targeting voice-over-networks CPE/IADs (customer premise equipment/integrated-access devices), infrastructure, wireless-infrastructure, and audio applications. The low-power, 16-bit, fixed-point LSI403LP DSP is based on the LSI Logic ZSP400 DSP core targeting voice-over-networks CPE/IAD devices and audio applications. The ZSP400 architecture applies aspects of microprocessor design to the LSI402ZX and LSI403LP implementations. Both devices are software-compatible with all ZSP devices and implement a five-stage, four-way superscalar pipeline to process as many as 20 instructions at a time. The processor's execution unit contains two MACs and two ALUs. The LSI402ZX includes 62k words of instruction RAM and 62k words of data RAM. The LSI403LP provides 16k words of instruction RAM, 16k words of data RAM, and 16k words of instruction- or data-configurable memory. An eight-channel DMA controller, which transfers instructions or data to and from memory, supports both devices.
Addressing modes: The LSI402ZX provides two independently enabled circular buffers and supports reverse-carry addressing. Reverse-carry addressing is an alternative mode of indexing the base-address registers that speeds FFTs and similar operations that require you to modify the next load or next store address in a reverse-carry fashion.
Special instructions or integral-peripheral functions: Both devices can perform a single-cycle add-compare-select for Viterbi decoding. They also support bit manipulation, 32-bit arithmetic, logic operations, and two-cycle complex-multiply instructions. Both devices include two high-speed TDM serial ports, a single 16-bit host-interface port, an external memory-interface unit, a four-pin (LSI403LP) or eight-pin (LSI402ZX) programmable I/O port, and an IEEE 1149.1 JTAG port for program downloading and debugging.
Support: LSI Logic and third-party tools support both devices. LSI provides a Gnu-based compiler, a linker, and an assembler, available for Windows and Solaris platforms. Green Hills Software offers a commercial tool chain, and Corelis offers JTAG debugging tools. Complementing the ZSP Solution Partners Program, ZOpen, LSI Logic's open-architecture-software framework, provides integration guidelines with supporting utilities, compliant third-party algorithms, and a methodology that standardizes application development. ZSP software-application partners provide ZOpen-compliant algorithms that you can integrate into your system designs.
Motorola's 56800 and 56800E
at a glance:
- The devices use a unified microcontroller/DSP architecture.
- The 56800E is an enhanced-core extension of the 56800.
Motorola's DSP56800 family integrates the performance and instruction set of a DSP with the control functions of an embedded microcontroller into a single core. These devices target applications that traditionally use 16-bit microcontrollers but also require DSP functions, such as point of sale, voice recognition, digital telephone-answering devices, and a variety of low-power applications. Motorola's 56800E family is an enhancement to Motorola's 56800 DSP architecture for applications that require more memory and greater performance. The 56800E core offers five times the performance (to 200 MIPS) at one-third the power consumption of the original core and double the microcontroller-code density. It offers expanded memory addressing to 4 Mbytes of program memory and 32 Mbytes of data memory. It includes 8-, 16-, and 32-bit data types; supports fast interrupts; and supports real-time debugging.
Addressing modes: The addressing modes cover register-direct, address-register-indirect, immediate, and absolute categories. Devices in the 56800E family support 19 total addressing modes across these categories.
Special instructions or integral-peripheral functions: The 56800 uses a bus structure that allows data to keep pace with the DSP while maintaining the peripheral set of a microcontroller. The 56800 peripherals include an interrupt controller, an external memory interface, general-purpose I/O, a scalable controller-area network, an ADC, a quadrature decoder, a PWM, serial interfaces, a quad timer module, JTAG support, and on-chip emulation.
Support: The Metrowerks CodeWarrior IDE tool set, as well as target development boards from Motorola for software development and companion daughtercards developed for market-specific applications, support the 56800 and 56800E. Other third-party tool developers and consultants support both device families.
Oak Technology's PM-44ix
at a glance:
- The four parallel-pipelined processors can perform 3700 MIPS/930 MMACs (million MACs) at 233 MHz.
- The PM-44ix supports as many as 16 color-ink-jet and 30 monochrome-laser copies per minute.
Oak Technology's iDSP family targets image-processing applications, such as imaging-enabled printers and multifunction peripherals. The iDSP provides designers with all the flexibility of a software-based image-processing option at the price and performance of fixed-function silicon. The PM-44ix contains four symmetric parallel-pipelined processors and employs the SIMD (single-instruction-multiple-data) parallel-processing architecture to take advantage of the parallelism inherent in image data.
Addressing modes: To maximize memory bandwidth, all memory accesses in the iDSP measure 32 bits. Specialized extraction and insertion units allow you to manipulate bit fields of any size within 32-bit registers.
Special instructions or integral-peripheral functions: The iDSP instruction set contains specialized instructions for manipulating image data and coordinating parallel processing.
Support: The iDSP programming environment includes an IDE, an image-processing library, and an evaluation board. Oak Technology's worldwide direct-sales and support organization supports the iDSP.
QuickLogic's QuickDSP
at a glance:
- QuickDSP is an integrated DSP and programmable-logic device.
- You can use the Quick DSP as a coprocessor or a preprocessor to other DSPs.
QuickLogic's QuickDSP family combines a DSP with the flexibility of programmable logic targeting voice-over-IP and imaging applications. This dedicated hardware option can achieve a fourfold improvement over traditional programmable logic for a range of functions, including floating-point arithmetic, FIR, IIR, adaptive filtering, FFTs, forward-error correction, and high-level data-link control. QuickLogic embeds a reprogrammable computational unit and RAM blocks into silicon to allow DSP-design engineers to implement complex algorithms and multiple-sample processing across single or multiple datapaths. Because the logic usage is efficient even for complex designs, design engineers can use smaller, less expensive devices with lower power consumption.
Addressing modes: You can configure the ECUs (embedded computational units) for eight arithmetic functions via a dynamically reprogrammable instruction-set sequencer. This flexibility lets designers reconfigure the ECU for algorithm-intensive applications, such as adaptive filtering.
Special instructions or integral-peripheral functions: The QuickDSP comes with 18 ECUs on the largest part (QL7180), an instruction-set sequencer, and multiple dual-port 2304-bit RAM modules. These RAM modules vary from 12 to 36 blocks for a total of 82.9 kbits of RAM. The QuickDSP comes with four PLLs that create a master clock from a lower input-frequency clock. One of the four PLLs is multiplexed with the dedicated clock, and the remaining three connect to global clocks.
Support: The QuickDSP RDK (reference-design kit) combines the QuickDSP device with a hardware- and software-development platform, allowing users to implement and debug their DSP and programmable-logic designs. The QuickDSP RDK is capable of in-system operation, in which you can attach the main RDK assembly to a third-party programmable DSP-evaluation module or DSP starter kit via two interface connectors. In this implementation, the QuickDSP device directly connects to the bus of the third-party host-programmable DSP processor, and the DSP program or any associated debugging environment can control it. Design support from the Corporate Applications Group at QuickLogic is available to customers.
RC Module's NeuroMatrix NM640x
at a glance:
- The vector coprocessor can handle variable-length, 1- to 64-bit data.
- Variable-length data enables speed and precision trade-offs.
RC Module's NeuroMatrix NM6403 is a dual-core application-specific DSP processor based on the NeuroMatrix architecture targeting video-image processing and neural-network applications. It provides scalable performance, a programmable operand width of 1 to 64 bits, and operation as fast as 50 MHz. This flexibility allows designers to trade precision for performance to suit their applications. The NM6403 processor includes a 32/64-bit RISC processor and a 1- to 64-bit vector coprocessor that supports vector operations with elements of variable bit lengths (patent pending). Two identical programmable interfaces work with external memory, and two communication ports are hardware-compatible with TI's TMS320C4x, allowing you to build multiprocessor systems.
The vector coprocessor, which has an SIMD (single-instruction-multiple-data) architecture, works on packed integer data comprising 64-bit blocks in the form of variable 1- to 64-bit words. The device supports vector-matrix or matrix-matrix multiplication. The Vector coprocessor's core looks like an array multiplier comprising cells that include a 1-bit memory (flip-flop) surrounded by several logical elements. You can combine the cells into several macrocells with two 64-bit programmable registers. These registers define the borders between rows and columns with macrocells. Each macrocell performs the multiplication on variable-input words using preloaded coefficients and accumulates the result from the macrocells in the column above it. The columns simultaneously calculate the results in one processor cycle. For 8-bit data and coefficients, the vector coprocessor performs 24 MAC operations with 21-bit results in one 20-nsec processor cycle. The number of MAC operations depends on the length and number of words packaged into a 64-bit block. The engine's configuration can change dynamically during calculations. An application can start with maximum precision and minimum performance and dynamically increase performance by reducing the data-word lengths. To avoid arithmetic overflow, the NM6403 uses two types of saturation functions with user-programmable saturation boundaries.
The RISC core (VLIW) has a five-stage pipeline that operates with 32- and 64-bit-wide instructions. Each instruction usually executes two operations. Two 64-bit interfaces support SRAM, DRAM, and EDO DRAM and comprise two separate address-generation units that can address as much as 16 Gbytes. Each interface supports two memory banks and can support a "shared-memory'' mode. Two DMA coprocessors transfer data between high-speed I/O-communication ports and external memory.
Addressing modes: The NM6403 supports 32-bit immediate, base, indexed, and relative addressing.
Special instructions or integral-peripheral functions: The NM6403 processor uses vector instructions to handle packets of as many as 32 64-bit data words. These instructions may define operations such as matrix-matrix, matrix-vector, or vector-vector multiplication, vector-vector addition/subtraction with saturation of results, block moving, and bit manipulation. The NM6403 has conditional branch, call, and return instructions.
Support: The NeuroMatrix Software Development Kit for PCs includes an ANSI X3J16/95-0029 preliminary-standard compatible C++ compiler, an assembler, an instruction-level simulator, a cycle-accurate simulator, a linker, a source-level debugger, a load/exchange library, and a set of application-specific vector-matrix libraries. RC Module offers PCI and CompactPCI evaluation/development boards. The vector-matrix library simplifies C-language programming for FFT, DCT, Sobel, and Hadamard Transform. RC Module also provides a NM6403 Verilog behavioral model for Sun host platforms for system-level simulation and a synthesizable core targeting Samsung and Fujitsu semiconductor technologies.
Sensory's RSC family
at a glance:
- RSC processors are specialized for speech recognition and synthesis.
- High-quality speech output is possible to as little as 5 kbps.
The RSC-3x and RSC-4x speech processors combine a microcontroller with advanced-speech-processing technology targeting high-quality speech recognition, speech and music synthesis, speaker verification, and record and playback. These devices feature a high-performance microcontroller with on-chip memories and a 24×24 hardware multiplier. The RSC-4x family also features a vector processor. Each device uses a neural network to perform speaker-independent speech recognition and achieve high-quality speech synthesis using both time- and frequency-domain-compression techniques, enabling them to provide high-quality speech output to as little as 5 kbps. In addition to providing the necessary horsepower to perform speech recognition and speech synthesis, the processors have sufficient cycles available for general-purpose product control for as many as 24 I/O lines.
Addressing modes: RSC devices support sequential addressing modes.
Special instructions or integral-peripheral functions: The RSC-3x and RSC-4x processors include an on-chip ADC, a DAC, and digital filters. The RSC-4x family also features twin DMA units, comparator blocks, a watchdog timer, and other product-control features.
Sensory's SC-6x family
at a glance:
- The SC-6X supports low-data-rate speech synthesis.
- Five bit rates for CX and MX support a range of speech-quality and memory requirements.
The SC-6x DSP family targets speech-synthesis applications. The Sensory speech algorithms support long-duration speech, compressed speech using 1-kbps MX , higher economy compression using 3-kbps CX. Five other fixed bit rates of CX and a variable range of MX bit rates are available to mix and match your quality and memory requirements. These DSPs support three low-power modes, two timer interrupts, one DAC interrupt, and five general-purpose interrupts to increase battery life and response speed to button and keyboard presses. The SC-6x devices have sufficient horsepower to support interactive interfaces and 14-channel polyphonic music while speaking.
Addressing modes: The addressing modes are immediate, direct, indirect-with-postmodification, and three relative modes. The program-counter unit provides addressing for program memory (onboard ROM). It includes a 16-bit arithmetic block for incrementing and loading addresses. It also consists of the program counter, the data pointer, a buffer register, a code-protection write-only register, and a hardware-loop counter (for strings and repeated-instruction loops). The program-counter unit generates a ROM address as output.
Special instructions or integral-peripheral functions: The SC-6x processors offer instructions to facilitate filtering algorithms, such as FIR, FIRK, COR, and CORK. FIR is useful for adaptive filtering or applications in which coefficients come from an external source. COR instructions perform 16×16-bit multiplies and 48-bit accumulation in three clock cycles. Instructions are also available to perform 16×16-bit multiplies and 32-bit accumulation in two clock cycles.
Support: Sensory's tools include a development environment with a C compiler, demonstration units, and evaluation and prototyping tools, such as the Voice Extreme Toolkit. Each tool set includes required hardware and software, complete documentation, and numerous samples. Turnkey product-development and linguistics services are available directly through Sensory or through its worldwide network of third-party development houses. For generic DSPs and processors, Sensory offers text-to-speech software and the small-footprint Voice Activation and Fluent Speech voice-recognition engines.
Siroyan's OneDSP architecture and SRXXX family
at a glance:
- The VLIW DSP architecture can scale as many as 32 dual-issue clusters.
- OneDSP can perform as many as 25.6 billion MACs at 200 MHz.
Siroyan's OneDSP architecture uses VLIW (very-long-instruction-word)-clustering techniques to provide scalable, high-performance DSP power allowing as many as 32 execution-unit clusters in a single core. Prevalidated configuration options include setting the number of clusters and endianess, as well as the cache-memory size and configuration. Each cluster consists of general-purpose registers, accumulators, a number of execution units, cache memory, local memory, and an on-chip bus interface. The master cluster executes either scalar RISC instructions from its instruction cache or VLIW instructions from its V-cache. In multicluster designs, VLIW instructions are issued in parallel from the V-cache in each of the slave execution-unit clusters. You can configure the SRA328 core with as many as eight execution-unit clusters with 32-bit datapaths. It targets communication and consumer applications.
Addressing modes: In addition to normal RISC addressing modes, OneDSP supports autoincrement, autodecrement, circular-buffer-addressing, and bit-reversed-addressing modes.
Special instructions or integral-peripheral functions: OneDSP supports Galois-field arithmetic for Reed-Solomon-coding applications and encryption algorithms. Siroyan will provide other application-specific instructions. The SRXXX cores have an integrated DMA engine capable of basic scatter/gather functions and bit-reversed addressing and ships with an example system that includes an AMBA AHB and APB bus system, an external memory interface, and an area of on-chip SRAM.
Support: Siroyan's OneDSP development environment runs on Unix, Linux, or Windows OSs. A debug adapter is also available for connecting the debug board on the target development board to the host computer via Ethernet, allowing programmers to share target boards. Siroyan supplies a tool chain for application-software development, including a Gnu C compiler for scalar code, an optimizing C compiler for both scalar and VLIW code, an assembler, a debugger, and an OS kernel. Siroyan also works with third-party developers to deliver software and tools.
StarCore Technology Center's StarCore SC100
at a glance:
- The SC100 architecture features compact code density.
StarCore's SC100 16-bit, fixed-point architecture has an extensible 16-bit instruction word that includes the SC140 DSP core. The scalable SC100 architecture targets communications applications. Low power dissipation per function helps extend battery life and meet power-per-channel budgets. Compact code density requires less memory, thereby reducing system cost. Customers can use a single DSP architecture and reuse key kernels and code for entry-level as well as advanced applications.
Addressing modes: The SC100 architecture supports register-direct mode, address-register-indirect mode, program-counter-relative modes, and special-address modes.
Special instructions or integral-peripheral functions: The SC100 multipliers support all combinations of signed and unsigned operands and both fractional and integer formats. The SC100 architecture supports an SIMD (single-instruction-multiple-data) version of maximum and minimum additions and subtractions (MAX2, ADD2, SUB2). It can perform eight 16-bit additions or maximum and minimum operations per cycle and includes MAX2VIT, which works with Viterbi shift left to accelerate Viterbi decoding algorithms. A user-defined instruction-set-accelerator module enhances the SC100 basic instruction set.
Support: StarCore creates low-level, baseline development tools, including a C compiler, an assembler; a linker; an instruction-set simulator; and optimized, hand-coded C-callable DSP-core libraries to assist programmers in application development. StarCore has also partnered with third-party developers, such as Metrowerks, Wind River, Green Hills, Tasking, Lineo, Ose Systems, Trinity Convergence, Signals and Software, and Numerix to provide a choice of tools, OSs, and applications software.
STMicroelectronics ST100 family
at a glance:
- The ST122 can perform 1200 million MACs/sec at 600 MHz.
- Interfaces support customizable coprocessors.
The general-purpose, 16-bit, fixed-point ST100 family architecture targets wired- and wireless-communications, automotive, and multimedia applications. The instruction set features DSP and 16- and 32-bit microcontroller instructions. The DSP architecture supports a 4-Gbyte memory space, 40-bit registers and accumulators, four idle modes for power-consumption reduction, and three zero-overhead nestable loops. It is scalable for high-performance and low power.
Addressing modes: The family supports 13 addressing modes, including circular, which is well-suited for FIR filtering, and bit reverse for FFT. Data-memory accesses handle bytes, half-words (16 bits), and words (32 bits).
Special instructions or integral-peripheral functions: The instruction set supports predication for most of its instructions, packed arithmetic, and a special instruction for Viterbi. The ST122 core supports 16×32-bit MAC operations for audio applications and multimedia-specific instructions. It can interface with as many as four tightly coupled coprocessors to improve system performance.
Support: STMicroelectronics and third-party partners, such as Green Hills and OSE Systems, provide a suite of evaluation boards and tools for hardware and software development.
Tensilica's Vectra DSP engine
at a glance:
- The Xtensa TIE (Tensilica Instruction Extension) compiler supports creation of new designer-defined instructions.
- The software-development tools are automatically re-created with each new processor configuration.
The Vectra DSP engine, a fixed-point coprocessor for the Xtensa, 32-bit RISC synthesizable processor architecture, targets high-volume embedded-processor and DSP applications. Designers use the Xtensa Processor Generator to configure and extend a family of core processors with custom functions while doing software development and testing. The DSP configurations and options available range from simple 16-bit MAC operations to five variants of Xtensa's Vectra DSP engine that include an SIMD (single-instruction-multiple-data) architecture; a vector register file for holding data; coefficient, and intermediate results; and support for single- and double-width operand sizes for greater computational accuracy. The optional DSP functional blocks are tightly coupled into the core pipeline.
Addressing modes: The Vectra DSP engine's four addressing modes include immediate and indexed with or without updates to the base register.
Special instructions or integral-peripheral functions: Instruction encoding of 16 and 24 bits reduces data moves, making better use of register files and its allocations. Compound instructions include special shifts, compare/branch, and zero-overhead loop instructions.
Support: The Gnu-based GCC or Xtensa XCC software tool suites automatically tailor themselves to support access to the resources in each new processor configuration. The development environment includes an instruction-set simulator, a bus- functional model, an RTOS OS kit, DSP libraries for the five Vectra configurations, EDA tool scripts, and the Xtensa multiprocessor-system-modeling API. Designers can use the Tensilica instruction-extension-language compiler to create algorithm-specific DSP functions.
Texas Instruments' TMS320C2000
at a glance:
- Devices combine performance and peripheral integration for the embedded-control industry.
- These code-compatible DSPs target embedded-control applications.
The TMS320C2000 family of 19 code-compatible DSP controllers offer a combination of on-chip peripherals, such as flash memory, fast ADCs, and CAN modules targeting embedded-control applications, such as optical-networking, tunable-laser, automotive, power-supply, and motor-control applications. The TMS320F2810 and TMS320F2812 DSPs are 32-bit control DSPs with onboard flash memory and performance to 150 MIPS. The C28x core offers 300 MIPS of computational bandwidth with a signal-processing core optimized for control. It is fully code compatible with current devices in the C2000 family.
Addressing modes: The C2000 DSP platform supports indirect and direct addressing.
Special instructions or integral-peripheral functions: The C2000 DSP platform integrates flash memory, an ADC, an event manager optimized for pulse-width-modulation generation, CAN modules, and serial interfaces.
Texas Instruments' TMS320C5000
at a glance:
- The C55x is one of the industry's most power-efficient programmable DSPs.
- Devices consume as little as 0.9V and 0.05 mW/MIPS with a maximum performance of 800 MIPS.
The TMS320C5000 DSP platform uses a modified Harvard architecture and includes the TMS320C54x and TMS320C55x DSP generations. The C55x DSPs are source-code-compatible with the C54x DSPs. The C54x focuses on low power consumption, but the C55x takes power efficiency to a new level: A 300-MHz C55x delivers a maximum fivefold improvement in performance over a 120-MHz C54x and dissipates as little as one-sixth its core power. The C55x has 12 independent buses, and the C54x has eight. Both architectures include one program bus and an associated program-address bus. The C55x bus is 32 bits wide, and the C54x bus is 16 bits wide. The C55x has three data-read buses and two data-write buses; the C54x has two data-read buses and one data-write bus. Each data bus also has its own address bus. The corresponding address buses are 24 bits wide on the C55x and 16 bits wide on the C54x. The C5000 DSP platform has 17 code-compatible DSPs sampling or shipping in high volume.
Addressing modes: The C54x supports single-data-memory-operand addressing that also supports 32-bit operands. It also supports dual-data-memory-operand addressing that parallel instructions use. It provides immediate, memory-mapped, circular, and bit-reversed addressing. In addition to the C54x modes, the C55x supports absolute addressing, register-indirect-addressing, direct-addressing, and displacement mode. The C55x includes dedicated registers to support circular addressing for instructions that use indirect addressing. Programs can simultaneously use as many as five independent circular-buffer locations with as many as three independent buffer lengths. These circular buffers have no address-alignment constraints. The C54x supports two circular buffers of arbitrary lengths and locations.
Special instructions or integral-peripheral functions: The C54x performs dedicated-function instructions, such as FIR filters, single and block repeat, eight parallel instructions, multiply, accumulate, and subtract (10 multiply instructions), and eight dual-operand memory moves. The C55x also has special instructions that take advantage of the additional functional units and increase parallelism capabilities. User-defined parallelism allows you to combine instructions to perform two operations. You can also combine a built-in parallel instruction with a user-defined parallel instruction.
Texas Instruments' TMS320C6000
at a glance:
- The C64x DSP architecture can scale to more than 1.1 GHz.
Texas Instruments' TMS320C6000 DSP platform, a general-purpose, VLIW (very-long-instruction-word) DSP architecture, targets advanced imaging, third-generation wireless and broadband communications-infrastructure applications. This architecture includes the floating-point TMS320C67x DSP generation and the fixed-point TMS320C62x and TMS320C64x DSP generations. The C62x DSP has eight independent, multipurpose functional units and can perform two 16×16-bit MAC operations per cycle. The C67x DSP is a superset of the C62x DSP instruction set that adds floating-point capabilities to six of the C62x DSP's eight functional units. The C64x DSP is object-code-compatible with the C62x DSPs but has significant architectural enhancements, such as four 16×16-bit MAC operations per cycle and operating frequencies of 400, 500, and 600 MHz. The C6000 DSP platform performs MAC operations by using separate multiply and add instructions. Thirteen code-compatible C6000 DSP products are available for sampling or are shipping in volume.
Addressing modes: The C6000 DSP platform performs linear and circular addressing. However, unlike most other DSPs, which have dedicated address-generation units, C6000 DSPs calculate addresses using one or more of its functional units.
Special instructions or integral-peripheral functions: All C6000 DSP processors can conditionally execute all instructions, a method of reducing branching and thereby optimizing performance. On the C64x DSP, the MPYU4 instruction performs four 8×8-bit unsigned multiplies. The ADD4 instruction performs four 8-bit additions. Six of the C64x functional units can perform dual 16-bit addition/subtraction. Two of the functional units perform dual 16-bit compare, shift, minimum/maximum, and absolute-value operations. The M units also support dual 16-bit and quad 8-bit averaging operations as well as bit-expansion and bit-interleaving and -deinterleaving operations. Four of the six remaining functional units support quad 8-bit addition/subtraction operations. Two functional units support quad 8-bit compare and minimum/maximum instructions. Some instructions operate directly on packed 8- and 16-bit data.
Texas Instruments' TMS320DA250
at a glance:
- This device includes built-in secure-digital memory-stick and multiple-digital-rights-management support.
- Software-development support includes third-party algorithms and compressed-audio algorithms.
TMS320DA250, a member of Texas Instruments' C55x generation of fixed-point DSPs, targets portable Internet-audio players, car stereos, home-audio jukeboxes, and other audio applications. The DA250 supports many digital-audio formats and digital-rights-management technologies. The general-purpose-I/O functions provide sufficient pins for status, interrupts, bit I/O for LCDs, keyboards, and media interfaces for a "microless" design. The parallel interface operates either as a slave to a microcontroller or as a media interface. The media interface includes an ATA flash card or a memory buffer for spinning media.
Addressing modes: Addressing modes include synchronous SRAM interfaces, SDRAM interfaces, or both, with general-purpose-I/O capabilities or enhanced 16-bit EHPI16 with general-purpose-I/O capabilities. The device also includes an enhanced 16-bit host-port interface (EHPI16) mixed with address bus.
Special instructions or integral-peripheral functions: Integrated peripherals include a real-time clock; a low- and full- speed USB 2.0 interface; memory-stick and MMC/SD (multimedia-card/secure-digital) interfaces; an I2C multimaster and slave interface; two 16-bit timers and one watchdog timer; and a 10-bit ADC for battery monitoring, buttons, and control signals.
Texas Instruments' TMS320DRE200
at a glance:
- The DRE200 is a turnkey reference design with comprehensive support.
- You can update and upgrade products via software.
The ETSI 300 401-compliant DRE200 baseband performs channel and source decoding on one chip. The digital baseband can decode all Eureka modes and perform user-interface functions. The DRE200 baseband is compatible with standard audio-DAC interfaces and can interface to an external microcontroller, DRAM, and SRAM. Devices achieve disturbance-free operation during multiplex subchannel reconfiguration or ensemble switching, and they can feed data to external TPEG (Transport Protocol Expert Group) or MOT (multimedia-object-transfer) decoders and external memory.
Addressing modes: The DRE200 supports immediate, absolute, accumulator, indirect, direct, stack, and memory-mapped-register addressing modes.
Special instructions or integral-peripheral functions: None.
Texas Instruments' TMS320DSCx family
at a glance:
- The cores are configurable per system requirements.
- The development environment supports system emulation before silicon tape-out.
The TMS320DSC21, TMS320DSC24, and TMS320DSC25 DSPs digital-imaging systems on a single chip combine a TMS320C5000 DSP and an ARM7TDMI RISC processor targeting media-processing and system-control functions. The chips integrate a video encoder with an on-screen display, an SDRAM controller with a bandwidth-transfer rate of 320 Mbytes/sec, and a preview engine that performs 30-frame/sec NTSC and PAL previewing (DSC21/DSC25). The DSCx family of products can achieve real-time processing of a full-resolution 2 million-pixel image with a 1-sec shot-to-shot delay. DSCx DSPs can support the capture of high-resolution still photos, and it can record video clips with audio and music from the Internet. These systems support digital-audio and -video formats, including real-time MPEG-1, MPEG-4, JPEG, M-JPEG, H.263 and MP3, as well as data-communication standards, such as IrDA (DSC21), USB, and RS-232.
Addressing modes: Addressing modes include SDRAM, SRAM, flash-media, and removable-media interfaces. The SDRAM transfer rate is 80 Mbytes/sec, with both 332 (DSC21/24/25) and 316 (DSC24) interface capabilities. The DSC24 enables 2-D-to-2-D data transfer from SDRAM to an on-chip image buffer, as well as direct SDRAM access via an SDRAM controller. The ARM can access the DSP via the host-port interface, and its bus controller has on- or off-chip access to general-purpose I/O, flash, Compact flash, and Smart Media applications.
Special instructions or integral-peripheral functions: In addition to the TMS320C54x DSP-generation instruction set, the DSCx DSP subsystem incorporates imaging enhancements to provide fast-block-based processing for imaging or video-encoding and -decoding functions.
Support: The eXpressDSP Real-Time Software Technology encompasses development for all of these devices and includes the Code Composer Studio integrated development environment; DSP/BIOS, a scalable real-time kernel; the TMS320 DSP Algorithm Standard, a standard set of coding conventions and application-programming interfaces; and a third-party network. Also available are evaluation modules, technical training classes, and customer-application support.
3DSP's SP-3, SP-5, and SP-20/UNIPHY
at a glance:
- Devices enable multifunction digital-imaging devices on one chip.
- Code compatibility enables single platform, multiple-product strategy.
The soft-IP-core, fixed-point DSP family, bus controller, peripherals, and microprocessor interfaces from 3DSP use a scalable SuperSIMD (single-instruction-multiple-data) architecture. The core supports multiprocessor systems, program cache or direct-mapped program memory, 32 prioritized interrupts, 32 general-purpose I/O pins, and a JTAG-only debugging interface. The SP3 is a programmable, five-stage pipelined DSP that targets MP3-player, home-audio (AAC, AC3), wireless-GSM-phone, GPS, and CPE (customer-premise-equipment) voice-over-packet processing applications. The SP-5 is a programmable, superscalar, dual-issue, five-stage, pipelined DSP that targets 3G wireless, voice-over-packet gateway, xDSL, MPEG4, and wireless-LAN applications. SP-5flex is a fully synthesizable and configurable DSP core, based on the SP-5 architecture, that supports balancing power, cost, and performance. It targets voice-over-packet, digital-wireless, audio, video, imaging, and broadband-modem applications. The SP-5V is a programmable, superscalar, dual-issue, five-stage, pipelined DSP that targets voice-over-packet applications. Development support includes a voice-over-packet software suite, application demo, and reference design.
The programmable, dual-mode, nine-stage, pipelined SP-20/UniPHY DSP IP core that includes custom instructions targeting physical-layer signal processing for 802.11a/b/g, HiLAN2, and xDSL. UniPHY combines accelerated versions of 3DSP's SuperSIMD architecture and SP-x instruction set with an expansion-instruction mode. UniPHY is capable of execution speeds of 400 MHz to 1 GHz because it supports a multiple-standard PHY implementation on the same processor. The "soft-datapath" technology and programmability enables a "softPHY" implementation that facilitates modification for changing physical-layer standards.
Addressing modes: The DSP cores provide circular-buffer, 2-D-matrix, bit-reverse, page, register-indirect, and stack-pointer addressing.
Special instructions or integral-peripheral functions: Each instruction can handle as many as 24 RISC-equivalent instructions. Each arithmetic instruction can process one 32-bit, one 24-bit, two 16-bit, or four 8-bit data. Push and pop instructions to allow efficient context switching. The SP-5V includes LMS24 and LMS16 instructions for echo-canceler algorithms.
Support: The software- and hardware-development environment includes complementary software and hardware IP and supports overall system emulation before to silicon tape-out.
Click here to return to part 1 of the DSP directory.
| For more information... | ||
| When you contact any of the following manufacturers directly, please let them know you read about their products in EDN. | ||
| Adelante Technologies www.adelantetech.com | Agere Systems www.agere.com | Analog Devices www.analog.com |
| ARC Cores www.arccores.com | BOPS www.bops.com | Cirrus Logic www.cirrus.com |
| DSP Architectures www.dsparchitectures.com | DSP Group www.dspg.com | Equator Technologies www.equator.com |
| Hitachi Semiconductor www.hitachisemiconductor.com | Improv Systems www.improvsys.com | Infineon Technologies www.infineon.com |
| LSI Logic www.lsilogic.com | Motorola www.motorola.com | Oak Technology www.oaktech.com |
| QuickLogic www.quicklogic.com | RC Module www.module.ru | Sensory www.sensoryinc.com |
| Siroyan www.siroyan.com | StarCore Technology Center www.starcore-dsp.com | STMicroelectronics www.st.com |
| Tensilica www.tensilica.com | Texas Instruments www.ti.com | 3DSP www.3dsp.com |
| References |
|















