Multiprocessing #5: Dataplane Processor Units

-July 14, 2009

In this article I would like to pick up on the processor and multiprocessor taxonomy themes that Robert Cravotta introduced in his article, and his two blog posts (first and second). Robert divided the multiprocessing world into four categories: “channel-based, aggregate-based, multi-domain, and feedback architectures”. Interestingly, he also talks about pipeline or streaming approaches as being useful in several of these multiprocessor architectures, which is true, but often not thought about much when they think about multiprocessor systems. Those who think of pipelining or streaming partitioning approaches more often than not will have some dataflow or signal processing experience, where this comes quite naturally.

I’d like to take a slightly approach to partitioning the problem. In this, we divide a typical modern SoC or system into two types of processing: control processing and dataplane processing.

The control plane handles processing tasks such as user interface, the higher levels of protocol stacks, system synchronization, applications such as (on a mobile device) address books, and a host of other tasks that are neither real-time intensive nor data-intensive.

The data plane is where data-intensive and real-time intensive tasks and applications are executed. These include streaming applications such as audio and video processing; baseband and lower levels of protocol processing for wired and wireless communications; encryption and decryption tasks, and a host of others.

Control plane processing is often the domain of standard fixed instruction set architecture (ISA) processors and controllers. These general purpose processors (GPPs) offer generic computing capabilities at a reasonable cost. When the applications are widely variant, and often unknown at run time – witness the downloadable apps on smart phones – a GPP may be the best solution. To handle an increasing number of such applications on future embedded devices, turning to a homogeneous multicore architecture for the control plane may be a reasonable way of adding spare capacity that can be powered off when not needed.

In the dataplane, architects in the past would often choose either HW solutions using custom or synthesized RTL, or fixed ISA digital signal processors (DSPs). But there are new opportunities to be both more flexible and more optimized than just the choices of the past.

The last decade has seen considerable development of configurable and extensible processor technology. This technology is based on a combination of three key characteristics:

  1. the generality of a RISC CPU and its instruction set, along with a wide variety of possible external interfaces and other CPU features for interrupt handling and context switching.
  2. data intensive computational elements similar to what is found in DSPs, such as multipliers and MAC units, zero overhead loops and Harvard architectures
  3. most importantly, a highly automated toolflow and underlying processor generation tools that allow users to select among hundreds of structural configuration options, including highly efficient direct FIFO interfaces, and in addition, to customize the instruction set by adding anywhere from a few to hundreds of specialized instructions to accelerate processing in specific application domains

The combination of these technologies into a new set of architectural choices gives rise to a class of application-specific instruction set processors (ASIPs) that one could call Dataplane Processor Units, or DPUs.

Computation and communications in the dataplane is no longer the sole province of dedicated hardware or fixed ISA DSPs. DPUs can be considered as viable options for some or all of the processing in almost every data-intensive application task in the dataplane.

A good example of this possibility is offered by audio encoding and decoding. What used to be done in embedded appliances by hardware many years ago can now be done with energy efficiency and in-field programmable flexibility by an audio DPU or a processor with specialized audio instructions. Many people do not realize that MP3 decoding is possible in a device with excellent results running at just a few MHz in a 65 nm low power process – on a processor. Because it is based on a DPU, such a device can execute new audio codecs as they are made available for download.

Architects and design teams can use DPUs in several ways:

  1. A team with in-depth knowledge of an application domain can utilize the processor generation technology to customize their own DPU and embed their proprietary intellectual property into the result, in both hardware instructions and domain-specific software.
  2. A supplier might create a DPU for a highly specialized domain such as audio, or provide packages of instruction definitions and codecs that can be added to a customer-specified DPU. For a highly data-intensive domain such as baseband processing, an extensively customized DPU that can accommodate a variety of protocols and standards is quite possible.
  3. A design team might take a DPU definition from a supplier, and by adding some of their own customizations and instruction definitions, “make it theirs” where they can add proprietary advantages to a more generic starting point.

But this technology need not be limited to the dataplane. If control plane processing can benefit from instruction customization or specialized high-performance interfaces such as direct FIFO channels, then the same configuration and extension technology can be applied to design a new control plane processor or multi-processor.

Looking back at Robert’s taxonomy, a DPU as we have suggested it is fairly close to his DSC – digital signal controller – although we believe that the use of configurable processor generation technology makes DPUs both more flexible, and also allows them to be less of a hybrid approach and more amenable to the generation of a set of heterogeneous DPUs that can be placed in many different parts of a multi-function embedded product, each specialized for particular application tasks. The overall product architecture then might actually be a combination of several of Robert’s multiprocessor types: certainly aggregate and multidomain, and possibly channel architectures as well, with pipelining and streaming used wherever it makes sense.

It seems that just as the need has arisen to design embedded products with rich and heterogeneous multiprocessing architectures, technology has risen to offer design teams some very interesting new options.

- Grant Martin, Chief Scientist, Tensilica


If you missed the previous guest post on multiprocessing, check out what Mark Hermeling from Wind River Systems had to say when considering multicore configurations.


Loading comments...

Write a Comment

To comment please Log In


No Article Found