EDN logo


Design Feature:September 28, 1995

Distributed operating systems combine multiple processors into a single machine

Richard A Quinnell,
Technical Editor

Distributed operating systems let you fuse a collection of processors into one virtual machine that is independent of the system architecture. System performance, however, will rely heavily on how those processors are connected.

The use of multiple processors in a system can offer performance and reliability benefits. Unfortunately, software that runs distributed across multiple processors has been notoriously difficult to develop. An emerging generation of operating systems (OSs) promises to ease distributed-software development.

System designers turn to distributed processing for many reasons. Distributed processing applies more processing power to an application than a single CPU does. Distributed processing also provides a system with scalability. By adding or removing processors as needed to match the application's requirements, a manufacturer can meet an application's performance needs and keep system costs to a minimum. Distribution improves a system's availability. A properly designed distributed system can maintain its function in the presence of hardware failures.


Picture One

A range of multiprocessor architectures exists. At one end of the spectrum lie the tightly coupled, or shared-memory, systems. These designs require the processors to pass data and messages to each other through a common memory area. Multitasking, multithreaded OSs, such as Wind River's VxWorks, Sun Microsystems' Solaris 2, and Lynx Real-Time Systems' Lynx/OS, offer extensions that allow the OS to handle a shared-memory, multiprocessor architecture. However, these systems are not distributed OSs.

A midrange approach uses fully independent processors connected to a common bus for data and message passing. Multiprocessor digital-signal-processing (DSP) systems often use this form of shared-bus design. It is at this point in the spectrum that distributed OSs begin to appear. Distributed DSP OSs that handle a shared-bus architecture include Multiprocessor Toolsmiths' Unison and Spectron Microsystems' SPOX-MP.

Fully distributed systems use independent processors that connect through a serial link to share data and pass messages. At first glance, the devices appear to be equivalent to a local-area network (LAN). However, the devices differ, because the OS for distributed systems is also distributed.


Networking differs from distributed OSs

Figs 1 and 2 illustrate the differences between a distributed and a networked system. A networked system (Fig 1) uses a network OS to link independent processors, each running an OS for its applications interface.

Each node in the network typically runs its own applications programs and asks for data and resources from other nodes in a client/server relationship (Ref 1). Nodes that need the resources or files located on another node must directly address that node to access the resource. However, this restriction is easing (see box, "When applications become objects"). Awareness of the network structure, therefore, is an integral part of a network OS.

The network structure appears functionally transparent to a distributed OS (Fig 2). Each node runs a fractional portion of the OS, called a microkernel (Ref 2), that handles basic I/O and memory management, limited scheduling and process management, and interprocess communications (IPC). Other elements of the OS run as though they were applications programs and interact with the user's programs by passing messages through the microkernel.

Whether the message passes through a mailbox in local memory or uses a serial link to another node, the OS and user programs recognize no functional difference. Therefore, the various elements of the OS and the user's program's tasks can reside on any node in the system. The collection of processors form a single virtual unit that is independent of the system's connection architecture. Any node in the system has full access to the entire system's resources.

One of the many system benefits a distributed OS provides is scalability. With scalability, adding or deleting processors from the design doesn't force a change to the applications programs. Because the software remains unaffected, you can start with a small system to ease costs, then add processors as performance needs increase. You can also develop your software on a single workstation, then migrate to a multiprocessor system without changing the code.

Because a task's operation is independent of its assignment, system designers can assign tasks to the processor. Tasks can reside on the most convenient node, such as having the user I/O in a command center with data logging occurring on a factory floor. Tasks can also reside on specialized nodes. Even though the distributed OS allows any task to run on any node, making a task local to a node with task-specific hardware can offer some performance advantages. Keeping task and hardware reduces data traffic over the interconnection network.


Distribution allows fault tolerance

Another benefit of a distributed OS is increased availability and fault tolerance. By moving tasks off a node, the system operator can remove the node from the system for maintenance or upgrade without interrupting system operation. If a node fails, a system operator can move tasks onto functioning nodes and resume operation before attempting to repair the failure. This feature decreases downtime.

How the OS reacts the moment a failure occurs, however, is one of the issues that designers must settle if using a distributed OS. Because each node and link in the system provides a potential point of failure, the likelihood of a failure occurring increases as the system grows larger. Designers should, therefore, resign themselves to the inevitable problem and plan accordingly.

Ideally, the OS provides both failure detection and failure propagation. Without failure detection, a broken link can bring the system to a complete halt. All tasks that depend on the broken link simply wait for the OS to complete its assignment. In a system with failure detection but without failure propagation, each task that depends on the link has to discover the failure on its own. This need for discovery slows the overall system's response. An OS with both features is equipped to respond quickly to system failures.

If the OS doesn't provide these two attributes, the application program must provide them. Watchdog timers on system calls, for instance, could trap link failures. Even if the OS does possess both attributes, however, the application program must still determine the system's response to a failure. Only the application can know the consequences of a given node's failure and determine an appropriate course of action.

Another issue designers must address when using a distributed OS is the effect of latency in the IPC. Even though the system's configuration has no effect on the application software's function under a distributed OS, configuration may affect the software's speed. The ideal distributed OS allows tasks to reside anywhere in the system. In practice, the speed and bandwidth limitations of the interconnect structure may impose restrictions. Unless accounted for in the application's design, the latency inherent in a distributed system can also restrict the system's scalability.

Shared-memory systems, however, typically don't need to worry about latency. IPCs through shared memory are nearly instantaneous. The exception is a shared-memory structure using Systran's ScramNet network, which maintains replicas of the shared memory at remote nodes by using a high-speed LAN. The device's latency, however, is only on the order of 4 µsec.

The two critical factors of failure detection and message latency depend on the IPC. The IPC, more than any other OS element, determines a distributed OS's suitability for a given application and multiprocessor architecture. Each distributed OS has its own approach to fault handling and process communications.

Spectron's SPOX-MP and Multiprocessor Toolsmiths' Unison, for example, offer IPCs tailored for the shared bus links common to DSP-multiprocessor designs. The OSs also work with serial links, but less readily. SPOX-MP assumes that all communications links are equivalent and takes no extra error-handling precautions with the more vulnerable serial links. Unison assumes that its serial links are error free. This assumption requires that the system designer provide error-correction coding and other communications reliability enhancements to use a serial link. Neither OS accounts for the difference in latency between the two communications channels.

When Applications Become Objects

Networking provides one way of connecting multiple computers together to cooperate on a task. However, the networks require the user of an application to know where the application resides and how to connect to it. The full potential of the network, thus, lies untapped, because applications cannot easily invoke each other's services.

The Common Object Request Broker Architecture (CORBA) standard (Rev 2.0) seeks to unlock that potential by providing networks with a mechanism for making applications transparently available across the network. The standard, defined by the Object Management Group, allows the application to act as a distributed object and provides an object request broker (ORB) to mediate exchanges between clients and the application server. The client doesn't need to know where a distributed object application resides, which language it uses, or under which OS it is running. All the client needs is the application's name and the interface specification the application publishes.

The CORBA ORB includes an interface definition language (IDL) that applications use to specify the messages and data they will handle. The CORBA IDL compiler then creates an intermediary client and server to connect the client with the application server. The client sees the CORBA server as a local operation and the application server sees the CORBA client as local. Thus, two applications can interact, even if they're running under different OSs.


Use dynamic invocation

Use of the IDL implies a static condition; the system operator constructs applications with previous knowledge of the objects with which they will interact. But CORBA also defines a dynamic invocation interface, which allows client applications to search an interface repository for useful service providers. The clients can then invoke a compiler that creates the necessary intermediaries.

The CORBA concept only works if the individual node's OSs support a CORBA-compliant ORB. Some OSs, such as Chorus, do. For other OSs, middleware can add the CORBA ORB. Iona Technologies' Orbix, for instance, brings the ORB to Windows, VxWorks, Solaris, and IRIX. Once the OS has the ORB, applications programmers need only use IDL to define the product's interface to make the product available across an entire network.

CORBA is not the only standard that attempts to weld the multitude of applications scattered across a network into an integrated whole. Digital Equipment Corp, which helped establish the Object Management Group, is also working on its own Common Object Model standard that achieves the same end. That offering should be available by year's end.

The two OSs differ significantly in their failure responses. SPOX-MP requires the application program to recognize that a task has failed to respond. The application program must trap the failure by querying task status or using a software time-out. The application program must then propagate the error report to other system tasks. Unison provides error detection and propagation. Once Unison detects a node (or link) failure, it informs all nodes of the failure. Each node then informs its tasks so that for operations involving the failed node, tasks return an error condition immediately. The nodes do not waste time waiting for a response that will never come. For both OSs, any system reaction to a node failure, such as reassigning tasks or rerouting communications, must come from the application program.

The Chorus OS from Chorus Systems provides an IPC that operates over LANs as well as within local memory. This IPC knows which link to use, thus ensuring the minimum latency possible for the task distribution. The Chorus OS provides hooks that allow the application program to detect node failures. The OS also provides system services for replicating tasks and dynamically reconfiguring the IPC, two essential services the application program can use to provide fault tolerance.

QNX Software Systems' QNX OS offers the most automated IPC. The OS works with a mix of serial and tightly coupled links and can accommodate a structure that provides multiple links between nodes. QNX automatically chooses among the available links for the most direct link appropriate to a message. QNX also automatically provides load balancing for equivalent links. However, the applications programmer can bias this balancing act to favor use of a specific link.

Looking Ahead

The success of distributed OSs lies in their most critical service--interprocess communications (IPC). A robust IPC provides system designers with the benefits of distribution and minimizes the latency. A poor IPC can reverse the situation. You can expect future generations of distributed OSs, therefore, to focus heavily on making the IPC more performance transparent as well as fault tolerant.

However, don't expect OS services that dynamically redistribute tasks throughout the OS to maximize CPU usage anytime soon. Automatically balancing loads is a tough assignment. With fully distributed systems, you have to factor in the communications link between the data source and the processor assigned the task. The link's bandwidth and latency may make the assignment unworkable, even if the CPU has the necessary processing capacity. The resulting balancing act is too complex and application-sensitive for the OS to handle on its own. Task allocation will remain the system designer's responsibility for a long time.

Similarly, adding fault recovery to a distributed system remains the system designer's task for now. With fault recovery, OSs can make some headway. Many fault-recovery actions can be configuration-independent and, thus, added to the OS services. QNX's automatic rerouting of IPCs around failed links is a good example.

So, even though distributed OSs allow you to develop distributed applications that run on a single virtual machine, you don't get a free ride. You still have to partition your application into independent, communicating tasks, then assign these tasks with the communications linkages in mind. The structure of distributed systems will only slowly become transparent to system performance.

These automatic behaviors allow a system designer to build in a degree of fault tolerance simply by providing multiple links among nodes. QNX then automatically routes message traffic around a failed link. The designer can minimize latency by providing a dedicated link along a critical data path, then biasing the OS to favor that path for specific tasks.

In the event of a link or node failure, QNX aborts the affected processes at their source. The OS then reports the failure to the application program and can either attempt to reboot the failed node or can dump debug/trace information into a buffer for later analysis by the programmer. QNX also periodically tests the link and re-establishes tasks on the node if the link becomes active.

Although these differences in IPC operation are critical to choosing a distributed OS, other factors may also be significant in the decision. One factor is heterogeneity, or whether all the nodes need to use the same type CPU. Systems comprising a variety of processor types can't use QNX: It runs only on x86 CPUs. Chorus, on the other hand, runs on x86, 680x0, SPARC, and Power PC processors. Other significant factors include the microkernel's memory needs, its compatibility with open standards, and the availability of debugging tools for multiprocessor applications (see box, "Debugging a distributed application").

The ideal distributed OS makes the interconnect hardware and architecture completely transparent to the application program, affecting neither function nor performance. The ideal OS also offers fault tolerance and runs on a heterogeneous mix of processor nodes. Although no current offerings match this ideal, available OSs do offer the attributes of scalability and functional transparency to systems incorporating multiple processors. And, with the multitude functioning as a single entity, software design for multiprocessing becomes more like conventional software design.

Debugging A Distributed Application

Having your application scattered across a handful of processors can greatly increase the difficulty of finding the inevitable bugs. With fully distributed systems, each processor is independent and can have its own debugger. However, you end up with several independent debuggers running and no way to coordinate their operation. Systems with shared resources prove even more of a challenge. Fortunately, tools are beginning to appear that ease the debugging of multiprocessing systems.

In SPOX-MP for multiprocessing, for instance, Spectron Microsystems has preserved links to its DBUG system-level debugger. The debugger can track tasks within the system independent of their location. SPOX-MP also provides a flood-fill boot loader that allows a single executable image to automatically distribute across multiple processors, simplifying system initialization.

Wind River Systems is offering its Tornado development environment for VxWorks. The Tornado interface to the target system can provide the full function of Tornado's development tools regardless of the connection available to the target. The device can also interact with the target software at the system-task level. These two attributes combined allow Tornado to handle distributed applications as readily as single-processor designs. Tornado can simply track the task at its source.

The array of tools Tornado brings to bear on distributed systems is impressive. In addition to including Wind River's WindView and StethoScope (Ref 3), Tornado provides an interface that allows third-party tools to "plug in" to the equivalent of a software backplane. Tools from ObjectTime, Math Works, and XLNT Designs are already available, with more to follow. Users can even add their own custom tools to Tornado.

Another set of tools comes from Multiprocessor Toolsmiths. Working under the Unison OS, the tools simultaneously examine all processors, allowing designers to monitor tasks regardless of their location. The tool kit includes a debugger, system-building resources for reassigning tasks, and a performance monitor. These last two features allow system designers to evaluate the distribution of tasks among processors and manually balance CPU utilization for maximum effectiveness. (Ref 4 provides a full listing of tools for multiprocessing systems.)

For those running Posix, Lynx Real-Time Systems offers a tool called TotalView. The tool provides a graphical-user-interface-based source-level debugger for C and C++ programs. The debugger can simultaneously track multiple processes and multiple threads. Because TotalView can function across a network, the processes being debugged may reside on different machines. TotalView handles both native and cross debugging for kernels, drivers, and applications programs.



You can reach Technical Editor Richard A Quinnell at (719) 530-0560 (phone/fax).


References

  1. Goscinski, Andrzej, Distributed Operating Systems--The Logical Design, Addison-Wesley Publishing, New York, NY, 1991.
  2. Quinnell, Richard A, "Microkernel and modular OSs," EDN, April 13, 1995, pg 43.
  3. Quinnell, Richard A, "Debugging real-time systems," EDN, November 23, 1994, pg 48.
  4. Strassberg, Dan, "To multiprocess or not to multiprocess?", EDN, June 23, 1994, pg 64.


Manufacturers Of Distributed Operating Systems

For free information on distributed operating systems such as those described in this article, circle the appropriate numbers on the postage-paid Information Retrieval Service card or use EDN's Express Request service. When you contact any of the following manufacturers directly, please let them know you read about their products in EDN.
Chorus Systems Inc
Campbell, CA
(408) 879-4100
Digital Equipment Corp
Marlborough, MA
(800) 332-2717
Enea Data
AB Taby, Sweden
+46 8 638 5000
Iona Technologies Ltd Dublin, Ireland
(353) 1-6686522 (800) 672-4948 US
Lynx Real-Time Systems
San Jose, CA
(408) 879-3900
Micro Digital Inc
Cypress, CA
(714) 373-6862
Multiprocessor Toolsmiths Inc
Kanata, ON, Canada
(613) 599-6565
Object Management Group Framingham, MA
(508) 820-4300
Perihelion Distributed Software
Shepton Mallet, Somerset, UK
(44) (0) 1749 344345
Precise Software Technologies Inc
Nepean, ON, Canada
(613) 596-2251
QNX Software Systems Ltd
Kanata, ON, Canada
(613) 591-0931
Spectron Microsystems
Goleta, CA
(805) 968-5100
Sun Microsystems Computer Co
Pleasanton, CA
(800) 786-0785, ext 205
Systran Corp
Dayton, OH
(513) 252-5601
Wind River Systems
Alameda, CA
(510) 748-4100



| EDN Access | feedback | subscribe to EDN! |
| design features | design ideas | columnist |


Copyright © 1995 EDN Magazine. EDN is a registered trademark of Reed Properties Inc, used under license.