Soft ARM CPUs, FPGAs and Structured ASICs: a useful benchmark?
I can’t think of anything that seems less elegant than implementing a microprocessor core in the fabric of an FPGA—rather like building a pendulum clock out of Legos. The result is bound to be huge, power-hungry and slow compared to the same core implemented even in a cell-based methodology. And yet soft CPU offerings from both Altera and Xilinx have been widely downloaded, and presumably widely used. Given that Structured ASICs are supposed to be serious alternatives to FPGAs at moderate to large volumes, would the same pattern hold for them? It’s a question that raises more questions than answers.
To begin with, why would anyone want to do this to an FPGA? I think there are several answers, none of which have anything to do with classical circuit optimization. One is simply that it’s an easy way for an experimenter to lay hands on a microprocessor. If you already have mid-sized FPGAs lying around, it’s free. There’s no need to requisition a CPU chip or a microcontroller from the supply room or try to talk your distributor out of some samples. You can have it this afternoon, and no one will ever know, so you’ll never have to explain what you wanted it for. All these are the classic FPGA benefits.
But there may be more practical reasons as well. If you need a particular configuration of MCU, especially if you want it to have some unusual peripheral controller or accelerator, you can turn RTL into a chip in your hands in hours. No need to shop the MCU directories, get a part qualified, reverse-engineer a data sheet to figure out if a commercial device will really work, or any such fun activities. And if you are doing a low-volume design that has an FPGA in it anyway, the BoM cost of a soft CPU core can be close to zero, while saving a package and a nest of fine routing on your board.
There are disadvantages, of course. A practical soft CPU core is likely going to be a non-standard architecture, with its own instruction set and programming tools. The only exception that comes to mind is the Actel implementation of an ARM-7. And true to stereotype (and physics) the soft implementation is going to be slow and inefficient relative to an MPU chip. In many designs this may not be an issue. But the speed limits are real: the Actel ARM-7, for example, even though it is virtually hand-coded for the Actel architecture, maxes out at less than 30 MHz according to their literature.
So what about a soft CPU core in a Structured ASIC? Supposedly, Structured ASICs provide an intermediary between slow, hungry, expensive but field-programmable FPGAs and fast, efficient, unit-cheap but inconvenient cell-based ASICs. For a number of reasons, a CPU core might be an excellent test case for these claims.
For one thing, a CPU core is always a challenging design, with lots of different structures, lots of potential critical paths, and blocks that will pick out the least bit of location sensitivity in a logic fabric. They are also very demanding on memory structures and on design flows. For another thing, it’s hard to fudge on definitions: a core passes the ARM compliance suite or it doesn’t, for instance. On paper, a soft CPU core design would not only be useful in a Structured ASIC—for the same reasons that it is useful in moderate-volume FPGA designs—but it would make a great benchmark.
The problem is, until today there apparently hasn’t been a soft industry-standard CPU core available for a Structured ASIC. But eASIC (in which the author holds a small financial interest, by the way) is announcing today a soft ARM926EJ-S core for their recently-unveiled product family. The design, according to eASIC, is the straight ARM RTL, modified only to use eASIC’s PLLs, I/O cells, and memory structures, and not optimized for the architecture in any other way.
As an embedded processor, the core looks OK. It has been certified ARM-compliant, and runs at about 150 MHz at 300 mW under typical conditions, according to the company. The core with 32 KByte caches requires about 23K logic cells in the eASIC fabric, the company says, meaning it takes up less than half of their smallest device, or a small portion of the larger dice. The core uses a standard AHB bus structure, allowing multiple cores on the larger eASIC chips.
This one data point appears to validate a lot that’s been claimed about Structured ASICs. The ARM 926EJ-S is not a trivial RISC engine, but a very substantial embedded processing core, and the claimed speed is plausible, though far from directly competitive with cell-based silicon, in which a 926 can clock in at 250 MHz or so.
This raises all sorts of interesting questions. Is this power-performance point typical of what can be achieved in a Structured ASIC, or is it specific, one way or another, to the single-via-layer-programmed eASIC architecture? Given that it’s not that hard to get a license from ARM for the 926 RTL, the core would make a very interesting basis of comparison among Structured ASIC architectures. ARM might even be interested in hosting such a party. Any takers out there in Structured ASIC land?















