EDN Access PLEASE NOTE:
FIGURES WILL LINK
TO A PDF FILE.

April 23, 1998


EDN's 1998 DSP 16-BIT Architecture Directory


Billions of Operations Per Second ManArray

09CS1622The BOPS ManArray is a general-purpose array-processor core with an encapsulated-very-long-instruction-word (eVLIW) architecture supporting 8- and 16-bit packed-data and 32-bit, single-precision, floating-point formats. By maintaining a simple 32-bit instruction set that supports both one×one-element and N×M-element arrays of processing elements, the architecture lets you use sequential and traditional single-instruction, multiple-data programming techniques.

The BOPS two×two-element array comprises a cluster of four processing elements (PEs) and one sequence processor (SP). This array is the basic building block for larger implementations, such as a two×four- and four×four-element arrangement. The BOPS architecture supports a topology that allows the devices to interconnect a set of PEs and reconfigure the organization of the array into standard ring, mesh, torus, hypercube, and other organizations, depending on how you want data to flow. The importance of the topology type is that the performance of any parallel algorithm depends upon the efficiency of data movement on the processor and the cost of the interconnection mechanism.

The SP handles program control and includes all of the PE's functionality, plus the instruction- and data-address-generation units. Each PE contains five execution units; a multiported, 32×32-bit register file; a VLIW-in-memory (VIM) unit; local data memory; and three interfaces. The five execution units comprise a multiply-accumulate unit (MAU), an ALU, a data-select unit (DSU), a 32-bit load unit, and a 32-bit store unit. The DSU supports data-manipulation instructions, such as shift, rotate, floating-point conversions, and PE-to-PE communications. The three interfaces comprise a 32-bit instruction bus, a 32-bit data bus, and a PE-interconnect bus for sending and receiving data among PEs.

The term "eVLIW" refers to the ManArray's ability to encapsulate an instruction sequence into a horizontal VLIW format that can execute simultaneously. You create eVLIWs with an eVLIW delimiter (DLM) instruction, which identifies how many of as many as five programmer-defined, 32-bit instructions comprise the VLIW, as well as the VIM address in which to store the instructions. After executing the DLM instruction, you issue a sequence of simple instructions that form the eVLIW. Once the SP stores the eVLIW in VIM, your program can dispatch, or broadcast, an execute-eVLIW (XV) instruction to all PEs and SPs. The XV instruction contains the VIM address pointer for the VLIW to execute. This nontraditional use of VLIWs effectively creates instructions tailored for applications using 32-bit instruction paths. Using the eVLIW architecture, BOPS requires no large VLIW buses around the chip, as is common with VLIW machines. The VIMs allow BOPS to use a single 32-bit instruction bus in the array of PEs; this approach promotes scaling in both the number of PEs and in the width of the eVLIWs.

The eVLIW architecture allows you to overlap the communications operations with the compute operations, thereby providing zero-latency data transfers among PEs. The architecture accomplishes this task by placing the communications instructions in the DSU and using software pipelining to transfer a result calculated by an arithmetic- execution unit in the previous machine cycle to any of the directly connected ManArray PEs. The load and store units provide independent datapaths between the SP and PEs and within each PE in the array.

Addressing modes

The BOPS two3two-element supports array-parallel memory-addressing modes, including direct, base plus displacement, register indirect, and modulo indexed.

Special instructions

The MAU and ALU support floating-point and packed-data operations with saturation, and the DSU provides a complement of bit-manipulation, shift, rotate, and PE-to-PE-communications operations. The first ManArray implementation supports only two levels of nested looping, although the architecture supports higher levels.

Support

BOPS is developing an assembler, a linker-loader, a visual simulator, and debugging/analysis tools. The BOPS visual simulator is an instruction-set-cycle-accurate simulator of the programmer-visible resources. It provides configurable views of all core resources, including registers and local memory, the disassembly of eVLIWs in VIM, visible pipeline phases, and execution control. BOPs also offers the VLIW Packer, a tool to analyze code for sequences of instructions that can be put together for VLIWs. The ManArray architecture lacks C-compiler support.


| Back |


Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc.