News and New Products
Stream Processors aims at parallel signal processing
By Robert Cravotta, Technical Editor -- EDN, 2/12/2007
The processor architecture relies on two MIPS 4KEc processor cores in conjunction with a DPU (data-parallel unit) that consists of a scalable number, currently eight or 16, of processing lanes. The system processor, a 4KEc core, runs the application operating system and software, and it manages the system I/O.
The other MIPS core and the DPU make up the DSP-coprocessor subsystem. The MIPS core communicates with the DSP dispatcher that manages the runtime synchronization of instructions and DMA data loads for the kernel functions that will execute in the DPU. The multilane DPU architecture executes the same VLIW (very-long-instruction-word) instructions across all the lanes. Each lane includes five 32-bit ALUs, including MAC (multiply/accumulate) units, four LRF (lane-register file) Ld/St (load/store) units, and a COM unit for interlane communication. Each ALU in the lane is independent and operates on local data.
This processing architecture best suits applications that are heavily computationally intensive on streaming parallel data. One of SPI’s processors can encode high-definition 1080p video (H.264 HD) in real time and still perform custom video enhancements, image tuning, and content analysis. Because the target applications are streaming data in nature, the system has no conventional cache. Instead, the compiler allocates the data into each device lane through an operand-register-file hierarchy. The same kernel function executes across all of the lanes, with each lane operating on a unique set of data. A high-speed interlane switch supports data exchange across all of the lanes.
The SPI compiler can support and exploit a C-programming model without special parallel constructs. After a designer explicitly marks the beginning and end of the computationally intensive kernel functions and the associated input and output data streams with intrinsics, the compiler implements static-flow analysis to effectively unwrap loops and optimize the on-chip-memory allocation to best use the local memories in each processing lane. By allowing the compiler to implement the parallelism from the C source, the source code remains compatible with chips with different number of lanes.
Both devices are available for sampling now, and high-volume production support is scheduled for July of this year. The 16-lane SP16-G160 is available for $99 (10,000), and the eight-lane SP8-G80 is available for $59 (10,000). The Storm-1 development kit is available now with a BSP (board-support package), printed documentation, and sample applications. The Eclipse-based RapiDev development environment includes the SPC compiler for Storm-1 family; a cycle-accurate TCS (target-code simulator), including MIPSsim; and FFD (fast-functional-debugger) host-simulation libraries. The development environment operating system support includes Linux user-package distribution for SP16 as well as the Linux Kernel (2.6.12) and drivers for SP16, and it includes cross compilers for Linux (Red Hat Enterprise Linux 3, Fedora Core 5) and Windows XP. The development board in the development kit includes a Storm-1 SP16 with 512 Mbytes of SDRAM and 32 Mbytes of flash memory, a PC-compatible PCI edge connector, an image-sensor connector, 10/100/1000-Gbit Ethernet, analog audio in/out, and a power supply for stand-alone operation.

















