EDN logo


Design Feature: July 18, 1996

OSs and development tools lighten the load

Richard A Quinnell,
Technical Editor

Tools and OSs can now overcome many multiprocessing difficulties. Still, designers must create the right software structure for multiprocessing to be effective.

Once the domain of supercomputers and research laboratories, multiprocessing is becoming a mainstream computer architecture. Off-the-shelf multiprocessor operating systems (OSs) and hardware have made software development easier, but not simple. Structuring software to take advantage of multiprocessing and debugging system code still present difficulties for designers.

The movement toward mainstream use of multiprocessing began with the introduction of multitasking OSs, such as Posix and Windows 95. Multitasking places many of the same constraints on software as does multiprocessing, including the need for processes to share system resources before completing their tasks. Multitasking OSs have structures in place to allow resource sharing.

The increased use of DSPs has also pushed multiprocessing toward the mainstream. DSPs often operate in specialized multiprocessing configurations, such as pipelines. Most DSPs have hardware structures that simplify interprocessor communications for data sharing. Even the PC is moving toward multiprocessing. The Pentium Pro processor has a dedicated bus that allows four processors to share the PC's resources and execute multiple tasks simultaneously.

As designers approach multiprocessing for the first time, there are many software-design options available to them. The simplest form of multiprocessing software places independent programs on each of the multiple CPUs, with some communication between programs. Designers can create and debug such systems one processor at a time using conventional methods. These systems don't readily scale to adding processors, however, or adapt to changes in the interconnect structure.

Another form of multiprocessing software available to designers is distributed cooperating tasks. Distributed and microkernel OSs (References 1 and 2) allow designs to map to a variety of hardware structures. Further, the mapping is table-driven and easy to modify, allowing the software to adapt to new hardware configurations with a simple recompilation.

Using the most general form of multiprocessing software, the multiple processors share the same task. The software structure for this objective depends on the hardware configuration in use, and the system memory's structure is a critical consideration.

Three sisters of memory

System-memory structures for multiprocessing designs include unified memory-access (UMA), nonuniform memory access (NUMA), and no read/write memory access (NORMA). UMA allows all processors equal access to system memory. Such designs are also called "symmetric-multiprocessing" or "tightly coupled" systems. Workstations and Pentium Pro systems use this architecture.

UMA structures allow the fastest possible interaction between processors. Those systems don't scale indefinitely, however, because ensuring memory access becomes more difficult as you add processors. The software overhead for controlling memory access begins to supersede performance gains when you use more than eight to 10 processors.

NUMA allows processors private memory in addition to sharing a portion of system memory. And, access times for the two memory types do not have to be identical. VME systems can use this architecture, in which each processor board has local memory, as well as access across the backplane to shared memory.

Fully distributed systems have NORMA structures. Processors in those systems have no access to common memory. Instead, all data sharing must occur over communications links.

Within each of these memory structures, there are several possible program-processing possibilities for the program's processing structure. Activities that process large data arrays, for example, can use a single-instruction, multiple-data (SIMD) approach. Such programs partition large data arrays into subarrays; then, each processor performs the same processing steps on its own subarray. Image-processing activities, such as a discrete cosine transform or search operations on a database are tasks amenable to the SIMD approach. Programming an SIMD system is relatively easy, because each processor runs essentially the same program.

The data-flow model is another straightforward multiprocessing-program structure. With this structure, processors perform an operation on streams of data coming from other processors and then pass the results on to other processors. Processors have a fixed function and interact with each other only if the data flow passes between them, so programming remains uncomplicated. The pipeline-processing algorithms common in DSP applications are one type of data flow. Data-flow systems can employ feedback loops and have multiple sources and destinations at each stage.

When your tasks or data are not suited to one of these simpler multiprocessing forms, however, software design quickly becomes complex. Designers programming a multiprocessing system to accelerate general computing operations face challenges. Many of these challenges relate to accessing common resources, such as data.

If two or more processors run programs that manipulate the same block of data, the system needs a protection scheme that coordinates the use of that data. For example, the two processors may attempt concurrent read-modify-write operations on a block of data. Both read the same value but make different modifications. The processor that finishes last unknowingly overwrites the changes that the first processor made. If the processors have local cache memory, the possibility of overwriting becomes even greater. Unless the system recognizes that one processor's operations have invalidated the cache on another processor, corrupt data quickly propagates throughout the program.

Caches aside, having the first processor place a lock on the resource while making its modifications prevents some overwriting but introduces the possibility of deadlocks. In a deadlock, the first processor locks resource A but needs resource B to complete its task. The second processor locks B but needs A. The two processors become deadlocked, waiting for each other to release the held resources.

Fortunately, many off-the-shelf OSs have built-in tools for resolving these and other multiple-access problems. Distributed OSs (Reference 1), for example, are designed to handle multiple-system-resource requests in a multiprocessing environment. The Ada programming language and associated OSs also incorporate multiprocessing as a baseline configuration. More traditional uniprocessor OSs, such as Integrated Systems' pSOS, Lynx Real Time Systems' Lynx OS, and Wind River Systems' VxWorks, have added extensions to handle multiprocessing needs. DSP OSs, such as Multiprocessor Toolsmiths' Unison and Spectron Microsystems' Spox-MP, solve many of the access problems associated with shared-memory multiprocessor configurations.

Many of these OSs allow designers to develop their code as though it operates on a single processor. The OS can then automatically distribute the code's elements across the various processors in the system. In some cases, the OS can also automatically select among parallel interprocessor-communications links, thus improving system reliability.

Careful code structure needed

The result of these OS capabilities is that software design for multiprocessing systems has become much easier. The OS handles most of the difficulties inherent in multiprocessing, freeing the designer to focus on the application's needs, not the implementation details. It would be a mistake, however, to believe that the OSs make multiprocessing automatic. Designers still need to structure their code to make multiprocessing effective.

One of the first things designers must consider is the distribution of software tasks among the processors. Many OSs can automatically make such distributions, but designers obtain better results by directing task distribution. The rule of thumb is to place tasks where they make sense.

Tasks that need specific hardware resources, for example, run best on the processor that has direct access to those resources. Although the OS may allow an output-data-formatting task to run in one processor while another processor handles the I/O task, the system may run faster if the same processor handles both tasks, eliminating interprocessor-communications overhead. Similarly, two tasks that need a high-bandwidth link between them may benefit from running on the same processor. Tasks that don't intercommunicate are good candidates for scattering among several processors.

In addition to task assignment, designers must consider the order of task execution. In single-processor systems, software tasks execute sequentially, except for interrupts. Task A begins before task B begins. Multiprocessor systems destroy such sequencing. Tasks may execute simultaneously or randomly. Designers must allow for this random ordering, ensuring either that tasks have no temporal dependencies or that tasks wait for their proper place in the sequence.

Forcing the code to follow a set task sequence, however, may result in software that gains nothing from a multiprocessing environment. A two-processor system in which each processor is 50% idle offers no performance advantage over a fully utilized single processor. A multiprocessor system could even slow fixed-sequence software. The overhead associated with controlling shared-resource access delays task execution, and the fixed sequence prevents the multiprocessing configuration from providing any performance gains.

Debugging: a formidable task

By keeping task assignment and sequencing in mind during design, software developers can effectively use multiprocessing OSs. But, design is only half the task in software development. The other half is debugging. Debugging in a multiprocessing environment can be a formidable task.

One of the first things designers should realize is that debugging multiprocessing software involves finding both logical errors in the code and errors in the interaction between parts of the code. A useful approach to multiprocessing debugging is to debug the code's logic in a uniprocessor environment. Standard real-time debugging tools (Reference 3) go a long way toward finding logical errors in multiprocessing software.

What a uniprocessor environment never reveals is time-related interactions between code sections. Even under a multitasking OS, a single processor can execute only one section of code at a time. Problems related to simultaneous or out-of-sequence code execution manifest only in the multiprocessor environment.

A new class of debugging tools has arisen that now operates in multiprocessor configurations. Table 1 gives a representative list of such tools; more are under development. One significant consideration in using multiprocessor debugging tools is the way in which they handle breakpoints. Some tools stop only the processor that executes the breakpoint. Others stop all processors in the system.

Table 1—Multiprocessing debugging tools
Company Product name Type Processors supported OSs supported Breakpoints Cost Comments
Stop one Stop all
Ariel AXOS/UXOS Software TMS320C30/40/80
X X $1495
BBN TotalView Software SPARC, Alpha, Mips AIX, OSF/1, SunOS, Solaris



CSPI BBN Debugger Software i860 VxWorks

$4000 Port of BBN Total View
DSP Research Tiger TEMxxx Emulator TMS320C30/31/32, C2xx, C40/44, C50/51/52/53/56/57, C541/2/3/5/6/8, Windows 3.1, 95, NT, DOS X
$2995 Supports Code Composer
Tiger TEMxxS Emulator TMS320C30/31/32, C2xx, C5x, C54x SunOS, Solaris
X

Tiger TEM4xS Emulator TMS320CC40/44, SunOS, Solaris
X $9995 Native Motif interface
Eagle Design Automation Eaglei/EagleV Software i960, 68k, x86, MIPS, 29k All
X $40,000 Simulation and debug system
Green hills Software Multi Software PowerPC, 680x0, 683xx, x86 Alpha, RAD6000 VxWorks, pSOS+, MicroItron, custom X X $2900 Starting price
Ada Multi Software PowerPC, 680x0, 683xx, x86 Alpha, i960, Mips, SPARC VxWorks, custom X X $5900 Starting price
Hewlett-Packard HP 16505A Emulator system analyzer Most (with preprocessor module) All
X $4995 Price is for analyzer. System uses HP16500B mainframe ($8980) and logic analysis modules ($13,000)
HP 16505A Emulator Pentium Pro All
X $5100 Preprocessor for 16505A, tracks multiprocessor bus activity
Huntsville Microsystems HMI-200-68356 Emulator 68356 pSOS X
$20,000 Dual trace for the two internal processors
Microtec XRay Debugger Software 68k, PowerPC VRTXsa X
$11,545 Price includes compiler and VRTXsa OS
Microtek International PowerPak Emulator x86, 683xx, 68HC16 Windows
X $10,000 to $30,000
Multiprocessor Toolsmiths Unison-Remedy Software 68k, i860, C40 Unison X
$12,000 Sun Host
Orion Instruments ADViCE Emulator 680x0, 683xx, H8/3xxx, SH series,MCF52xx, SPARlite, FR-20 series, ARM7DT, MELPS series, TLCS Windows, Unix X
$23,000 Includes source-level debugger
Orion 8800 Emulator 68302/06, 68356, 68EC000, 80C196, Z80/180 Windows, DOS X
$7100 Includes source-level debugger
QNX Software Watcom Development System Software x86 QNX X
$845 QNX development system with debugger
Rational Software VADS MP Software SPARC, Mips, PowerPC, Alpha Solaris, IIX, AIX, Digital Unix
X $10,500 Ada development system with debugger
APBX Software SPARC, Mips, PowerPC, Alpha Solaris, IIX, AIX, Digital Unix
X $23.000 Ada development system
RTMX Software RTMX O/S Software x86, Pentium, SPARC, RTMX X X $3995 Tools included with OS
Silicon Graphics ProDev Workshop Software Mips Irix X X $2000
Sky Computers Time Scan Software i860 SunOS, Solaris

$1495 System visualization
Softaid VEM Emulator x86, Z80/180, 68HC16, 68000 Most RTOSs X
$5000 to $10,000
Sonitech International Brahama debugger Emulator C3x/C4x, DSPs DOS, Windows, SunOS, Solaris X
$4000
Spectron Microsystems Spox-Debug Software TMS320C3x/4x Spox-MP X X $4000
Tektronix TL4500 Logic analyzer Most Unix, X-Windows
X $18,350
DAS NT/DAS XD System analyzer Most Unix, X-Windows
X $11,800 Cost covers the mainframe. Acquisition cards $15,000
LA-Offline Software Most Windows 3.1, SunOS, Solaris
X $2950 to $4950
LA-Browser Software Most Windows 3.1, SunOS, Solaris
X $1500 to $2950
Wind River Systems CrossWind Software 68k, CPU32, i960, PowerPC, x86, Mips, SPARC Windows 95/NT, HP-UX X
Debugger, included in Tornado
Stethoscope Software PowerPC, x86, Mips, SPARC Windows 95/NT, HP-UX X
$5000 Graphical profiling
WindView Software PowerPC, x86, Mips, SPARC Windows 95/NT, HP-UX X
$5000 System visualization


Tools that stop all processors act like traditional debugging tools extended to multiple processors. Both types attempt to freeze the system at a given state and then allow the designer to examine that state. Unfortunately, the debugging tool usually cannot shut off all processors simultaneously. If the tool has a direct link to each processor, the shutdown can occur quickly. If the tool must work through the operating system or if it controls the processors with software commands issued over the interprocessor-communications links, the shutdown can take milliseconds.

Tools need time to halt system

Because each processor shuts down independently, the frozen state does not represent an instant in time but, rather, a smear. Designers must be careful in interpreting the data that the tool gathers, making sure to account for the disparity in event timing. Designers should also make sure that delay timers don't time out and generate false error messages as the system is slowly shutting down.

Tools that stop only the processor executing the breakpoint are useful in applications in which the bulk of the system must continue operating. Control systems, for example, may need to continue operating while the designer examines the code under test. Stopping only one processor may introduce errors in the rest of the system, however. A system that uses daisy-chained interprocessor communications, for example, may find its communications channel blocked by the halted processor.

After the multiprocessing code is up and running, designers may find that the system's performance does not show the gains expected from the additional processing power. Performance-analysis tools can help identify optimization steps. By making a baseline-performance measurement in a single-processor system and then remeasuring performance after moving a task to a second processor, designers can empirically determine which tasks to distribute among the processors and which to keep together. Profiling tools can also help optimize multiprocessing systems. Sections of code in which the processors spend an unusually long time may suffer from deadlock or from clogged communications channels.

The combination of extended OS features and specialized debugging tools now available is rapidly making multiprocessing software development easier. OSs alone, however, cannot turn code designed for a single processor into efficient multiprocessing code. The OSs may automate some of the parallelization and distribution of code, but designers still need to direct those efforts. In addition, designers need to examine the implications of task distribution and execution sequence. Multiprocessing software development is becoming easier, but it's still not simple.

Looking Ahead

Processor technology is beginning to peak in clock speed and data-bus width. User demands, however, continue to increase unabated. The result will be a move toward multiprocessor configurations as standard design practice. That movement is already well under way, even though fewer than 10% of current designs are multiprocessor systems. Indicators of that motion include the increasing introduction of multiprocessor extensions to standard OSs and the number of debugging tools that operate in multiprocessor environments. Further proof comes with the Pentium Pro, which practically shouts that the next generation of PCs will use multiprocessing to achieve performance gains.

Software designers, therefore, will need to become familiar with multiprocessing concepts and techniques if working with high-performance computing systems. This need includes embedded-system designers, as well as workstation and server developers. Less than five years ago, multiprocessing was rare. Within five years, it will be standard practice for all but low-end designs.



You can reach Technical Editor Richard A Quinnell (719) 530-0560, fax (719) 530-0560**, email ednquinnell@mcimail.com


References

  1. Quinnell, Richard A, "Microkernel and modular OSs," EDN, April 13, 1995, pg 42
  2. Quinnell, Richard A, "Distributed operating systems combine multiple processors into a single machine," EDN, Sept 28, 1995, pg 38.
  3. Quinnell, Richard A, "Debugging real-time systems," EDN, Nov 23, 1994, pg 48.


For free information...
When you contact any of the following manufacturers directly, please let them know you read about their products at the EDN Magazine WWW site.
Ariel Corp
Cranbury, NJ
(609) 860-2900
BBN Corp
Cambridge, MA
(617) 873-3165
CSPI
Billerica, MA
(508) 663-7598
DSP Research Inc
Sunnyvale, CA
(408) 773-1042
Eagle Design Automation
Beaverton, OR
(503) 520-2300
Green Hills Software Inc
Santa Barbara, CA
(805) 965-6044
Hewlett-Packard
Santa Clara, CA
(800 452-4844
Huntsville Microsystems
Huntsville, AL
(205) 881-6005
Integrated Systems Inc
Sunnyvale, CA
(408) 542-1500
Lynx Real Time Systems Inc
San Jose, CA
(408) 879-3900
Microtec
Santa Clara, CA
(408) 980-1300
Microtek International
Hillsboro, OR
(503) 645-7333
Multiprocessor Toolsmiths
Kanata, ON, Canada
(613) 599-6565
National Semiconductor
Santa Clara, CA
(408) 721-5000
Orion Instruments Inc
Sunnyvale, CA
(408) 747-0440
QNX Software Systems Ltd
Kanata, ON, Canada
(613) 591-0931
Rational Software Corp
Santa Clara, CA
(408) 496-3600
RTMX Inc
Durham, NC
(919) 493-1452
Silicon Graphics
Mountain View, CA
(415) 933-1676
Sky Computers Inc
Chelmsford, MA
(508) 250-1920
Softaid
Columbia, MD
(410) 290-7760
Sonitech International
Wellesley, MA
(617) 235-6824
Spectron Microsystems
Santa Barbara, CA
(805) 968-5100
Tektronix Inc
Pittsfield, MA
(800) 426-2700
Wind River Systems
Alameda, CA
(510) 748-4100



| EDN Access | feedback | subscribe to EDN! |
| design features | out in front | design ideas | departments | products | columnist |


Copyright © 1996 EDN Magazine. EDN is a registered trademark of Reed Properties Inc, used under license.