Multicore DSP SoCs: just how flexible can they really be?
One of the favorite claims of multicore enthusiasts is that with multicore processing, you get the flexibility of software with the performance and energy-efficiency of hardware. Ideally, champions of the concept claim, you could do a single SoC comprising an array of programmable CPU or DSP cores and local memories and a sufficiently clever DRAM interface, and then adapt the chip design to any of a huge range of applications—from image-processing in digital cameras or camcorders to face recognition to home gateway switching to cellular baseband chips—by simply changing the peripheral controllers.
The most frequent reply to this claim has been "hogwash." Detractors point out that no one of the processor cores can even approach the efficiency of dedicated hardware, and given the state of parallel algorithm design and coding you lose, rather than gain, efficiency when you attempt to spread an algorithm across multiple cores. Further, skeptics would point out that the problem of efficient DDR channel management becomes much harder as the number of initiators increases, and harder again if the initiators are engaged in different tasks. And then the detractors deliver what they consider to be the knockout punch, observing (correctly) that precious little progress has been made in the whole history of computer programming on the problem of parallelizing applications by any means other than exploiting data parallelism.
Given that these points of view are both interesting, the question cries out for some hard data. And with an announcement today—and some additional ones coming up during the year—Montreal-based media signal processing vendor Octasic is providing just that: a case study in the retargeting of a DSP-array SoC across a range of different applications.
The occasion of this experiment is Octasic’s move into the wireless baseband processing market. The company has invested some careful market analysis and some strategic hires in laying the groundwork for this, and now, according to vice-president for software-defined radio Emmanuel Gresset—himself an acquisition from ST Microelectronics—the company has set its sights on providing base-station solutions for the entire 3GPP roadmap through LTE.
The surprise in this announcement lies in the hardware upon which it is based: the OCT1010 SoC. This chip, according to Gresset, is architecturally simple: it comprises 15 Octasic Opus DSP cores, local fast RAMs for each core, and a very carefully designed DDR DRAM controller. The device was originally designed to serve the VOIP and audio-processing markets. With the addition of a few key instructions to the instruction sets of the cores, Gresset says, Octasic was able to retarget the chip for video processing. And now, with another few changes to the instruction set, the OCT1010 is taking the company into the GSM/EDGE basestation business.
To some extent the Octasic chip is a special case. The underlying Opus DSP core is a RISC-ish superscalar design that presents a conventional programming model to users. But it is implemented using asynchronous datapaths. Gresset described the core as a collection of self-timed ALUs that implement dataflow machines over a shared pool of synchronous register files and fast RAM. The chip is implemented in 90 nm CMOS and built at ST’s Crolles fab. But because of the speed of the self-timed operation, Gresset said, the Opus core has the equivalent of 1.2 GHz performance.
This speed apparently is not just marketing-speak. Gresset claims that the design team had to implement their own register-file and RAM designs because, in fact, these shared registers and memory do have to operate at 1.2 GHz—well outside the nominal speed for 90 nm memories. In addition to the very high realized performance, the asynchronous design approach ended up giving the Opus core low energy consumption. Gresset says the chip can do GSM baseband processing with all but a couple of the cores active at under 1 W.
So the asynchronous design of the Opus cores gives Octasic two advantages—high peak performance and high energy-efficiency—that both are big helps to scalability. The next question is how scalable the memory architecture turns out to be. And the answer there appears to be positive as well. The company will claim that the chip—outfitted with their software—can handle all baseband processing for multiple channels of GSM, EDGE, or EGPRS2. GSM requires one Opus core per channel, while 8-PSK EDGE requires two cores per channel.
Nor is this the end of the story, according to Gresset. Further extensions of the chip—which may or may not include some dedicated hardware accelerators—will be aimed at CDMA, WCDMA, and LTE macro basestations, again using a software-defined approach with essentially the same array of DSP cores. Even OFDM schemes, Gresset says, can be handled with the chip architecture, as can the MIMO requirements of LTE. The company is still working on the software partitioning and proof-of-concept on the advanced standards such as LTE, but Gresset reports growing confidence.
From a market point of view, this flexibility offers the promise of software-configurable macro basestations that can on the fly reconfigure themselves for the most profitable mix of wireless standards. This capability is of increasing interest to basestation operators—so much so, Gresset says, that architects are today getting there with arrays of conventional DSP chips and FPGAs.
But from the viewpoint of an architectural study, Octasic’s plans strongly support the idea that—if the underlying processor cores are sufficiently elegant—the array-of-processors architecture can indeed be retargeted across a wide range of applications. From VOIP to multi-channel EDGE baseband processing is quite a range. And Octasic, at 90 nm, is not even pushing process technology. How applicable this data is beyond the rather unique Octasic Opus core or this particular range of applications remains to be seen, but it makes one good solid data point.















