Steve LeibsonLeibson's Law: It takes 10 years for any disruptive technology to become pervasive in the design community. This blog is about the disruptive technologies that either have or will win over electronic engineers, some that won't, and why. Written by Steve Leibson, Tensilica's Technology Evangelist. See my history site at www.hp9825.com. You can email me by taking the first letter of my first name, appending that to my last name, then the magic email symbol, followed by the name of the company I work for, and then a dot followed by com.

View Steve Leibson's profile on LinkedIn

Profile

RSS Feed

  • Add this blog to your RSS newsreader!

Recent Posts

Recent Comments

Most Commented On

Archives

By Category

Blog

Thursday, November 13, 2008

IEEE Spectrum Online Article Warns Savvy System Designers of the Multicore Memory Wall

Nov 13 2008 12:38PM | Permalink | Email this | Comments (0) |
Blog This! using:  Blogger.com | LiveJournal |
Digg This | Slashdot This | add to Del.icio.us


As a long-term, Senior member of the IEEE, I did feel a pang or two when I wrote up the minor technical gaffe that appeared in an IEEE Spectrum article for my previous blog entry. I’m delighted to salve my conscience by referring you to a very good, short article about multicore design and the impending memory wall on the IEEE Spectrum Online site (see Multicore Is Bad News For Supercomputers by Samuel K Moore). This article documents simulations showing that a unilateral increase in processing power with no increase in aggregate memory bandwidth eventually leads to data starvation and a loss of processing performance. Here’s the graph:

 

 

 

The graph shows that for the simulations in question, system performance peaked at four processors, was flat to eight processors, and then fell as more processors were added to the system simulation. The culprit, as any good system engineer should know almost intuitively, was memory starvation. As you add processors to a chip but keep the memory interface pipe at a constant size, you reach a point where the processor-to-memory connection saturates. From that point onward, the additional processors starve for data. In the embedded world, where we’re headed towards tens and hundreds of processors per multicore and many-core chip, there are only three ways to solve memory starvation; there are only three ways to avoid hitting the wall:

  • Make the processor-to-memory pipe bigger (more bandwidth)
  • Add more processor-to-memory pipes (more memory ports = more memory bandwidth)
  • Add more local memory to each processor (more aggregate memory bandwidth)

Each of these alternatives costs money. Adding more processors also costs more money. More performance costs money. So which approach is the best? It really depends on the application. Many multicore/many-core SOC designs employ a DDR memory controller (Virage Logic and Northwest Logic have good ones) with one DDR memory port on the chip. Bulk DRAM chips are very economical ways to store data on a per-bit basis and DDR ports are a good way to connect them to an SOC.

Boosting memory bandwidth is a very popular approach and has produced several generations of DDR ports with data-transfer rates ranging 1600 Mbytes/sec to 12,800 Mbytes/sec. This approach increases memory-port speed without adding pins to the SOC but the memory-chip cost increases with each boost in memory-port clock rate.

It’s a relatively simple matter to add a second memory port to the SOC and double the bandwidth. This approach has the advantage of not adding a lot of pins to the SOC but has the disadvantage of splitting data into two separate memory chips, which are twice as expensive as one memory chip, and this approach certainly complicates the software.

It’s also possible to design the SOC so that each of the on-chip processors (or most of the on-chip processors) has all of the local, on-chip memory it needs. As the amount of silicon devoted to on-chip memory increases to more than 90%, that’s clearly a viable design approach but you’ll need to know—at the time you’re designing the system—how much memory each processor needs for all of its assigned tasks.

Note: Here’s a chart I used for an SOC memory panel I chaired last week at the International SOC Conference in Newport Beach. The chart predicts that 90% of an SOC will be used for memory by the year 2011 and that the percentage devoted to memory will continue to climb after that. I found this chart in a paper published at the Date ’05 conference.

 

 

 

 

My purpose in writing all of this is not to advocate one SOC design approach over the other to hurdle the memory wall. I truly don’t have a favorite. The “right” choice is application dependent. Sorry, I know you’ve heard that mantra before, but that’s because it’s so often true. My purpose here is to alert you to yet another knee-jerk approach to system design that’s about to stop working: more processors without more memory bandwidth spells failure. And so this is yet another plea to become more cautious about using such design approaches. Complex system designs require careful, intelligent design approaches.


Related entries in: EDA | Simulation | SOC | 


Post a comment


Display Name

Before submitting this form, please type the characters displayed above:


ADVERTISEMENT

©1997-2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites

ADVERTISEMENT
You will be redirected to your destination in few seconds.