The Last Few MHz – SOC Design and the IEEE Electronic Design Processes Workshop 2008
Ian Rickards, ARM’s CPU Product Manager spoke about the “Myths of Multicore” at last week’s IEEE DATC Electronic Design Processes Workshop held in Monterey, California. He spoke in favor of multicore SOC design, which is not surprising given ARM’s multicore processor offering. What was surprising were the excellent technical numbers in the talk. These numbers are useful to anyone contemplating the architecture of their next SOC. I won’t recount all of the myths in Rickards’ presentation (at least not in this blog entry). However, I want to focus on myth #2: “Aren’t two cores twice the size?” That’s a very important question for any SOC architect and the answer is far from obvious.
Rickards’ first point in this section is that the last few MHz cost a lot of silicon. If you push a synthesis tool to get the fastest possible processor, you get a bigger processor, along the lines of the following graph:

Now it’s obvious (or should be by now) that pushing a processor’s clock rate ups the power dissipation. What might not be so obvious (though it is with a bit of thought) is that pushing the synthesis tool for speed increases the gate count. This happens because the synthesis tool must increase the number of gates in certain nodes and increase driver size in certain nodes to achieve the targeted speed. In fact, increasing the target clock rate increases power and area exponentially.
So how does that increased area compare to using two processors? Rickards provides the data. The table below compares the same processor synthesized for optimum area and power.

Note that the processor area nearly doubles when the target clock rate doubles. Not much different that using two processors at half the clock rate. Rickard then gave numbers comparing the power efficiency of the processor with a doubled clock rate with two processors running at half the clock rate (thus executing the same number of instructions per second). These numbers are also interesting. One processor core running at 304 MHz dissipates 87 mW. Two such processors therefore dissipate 174 mW. However, one processor core running at 608 MHz dissipates 444 mW. You’re way, way ahead if you can find a way to use two processors running at the slower clock rate.
These numbers put several nails in the multithreading coffin. The reason for adding multithreading capability to a processor is to reclaim lost cycles caused by memory stalls. What causes memory stalls? Slow memory. However, memory is only slow with respect to the processor. If you push processor clock rate, you get more stall cycles because your memory has become slower, in a relative way. Multithreading then attempts to fix this problem by instantly switching to another task that’s not waiting on memory or I/O when a stall occurs. Yet multithreading amplifies the desire for higher clock rate in a sort of vicious cycle. If you’re trying to run multiple tasks on the processor to alleviate memory stalls, you will be sorely tempted to run the processor at even higher clock rates to execute all of those multithreaded tasks in the time available. You’ll end up with even more stall cycles as a result. (For more on multithreading, see my next post.)
You’re better off if you use two processors, which allows you to slow the processors down and get a better match between processor and memory speeds. This is one of the many big benefits of multicore SOC design.















