Steve LeibsonLeibson's Law: It takes 10 years for any disruptive technology to become pervasive in the design community. This blog is about the disruptive technologies that either have or will win over electronic engineers, some that won't, and why. Written by Steve Leibson, Tensilica's Technology Evangelist. See my history site at www.hp9825.com. You can email me by taking the first letter of my first name, appending that to my last name, then the magic email symbol, followed by the name of the company I work for, and then a dot followed by com.

View Steve Leibson's profile on LinkedIn

Profile

RSS Feed

  • Add this blog to your RSS newsreader!

Recent Posts

Recent Comments

Most Commented On

Archives

By Category

Blog

Tuesday, August 26, 2008

Hot Chips 2008: Memory Coherency Over Networks—Can that Possibly Work?

Aug 26 2008 10:31AM | Permalink | Email this | Comments (5) |
Blog This! using:  Blogger.com | LiveJournal |
Digg This | Slashdot This | add to Del.icio.us


Here’s a crazy idea: extend memory coherency for an SMP system across a LAN. Now why in the world would you do something so crazy? Well, it turns out that servers have a lot of trouble distributing loads and allocating resources. The result is low server utilization. SMP systems partially solve this problem by creating a pool of processors linked by coherent memory. Any processor can restart any stalled task because of the coherent memory. However, the number of processors in a cluster is limited and processors in other SMP clusters located in other parts of a server farm at the other end of LAN piping do not have a coherent-memory link so they cannot be easily used for load distribution without a lot of data movement across the LAN. That’s super slow.

So, the answer is simple. Just extend memory coherency across the LAN. That’s the solution that 3Leaf’s Krishnan Subramani and Shahe Krakirian proposed in their Hot Chips paper: Network Based Coherency: Extending a Processor's Coherency Domain over a Standard Network. Say what? To a dyed-in-the-wool microprocessor busmeister like me, this is heresy. LANs are too slow! Way, way, way, too slow!!!

Alas, my aged mental facilities and decrepit system-design skills are showing. I’m not looking at the whole picture. Here’s the slide that did it for me:

 

This slide shows that the latency through a 40Gbit/sec network switch is now on par with the read latency for a DRAM. Hmm. This idea doesn’t look so crazy after all. In other words—it’s so crazy it just might work.

This talk focused on 3Leaf’s TL1550 Coherent Network Controller, which exploits AMD’s Hypertransport protocol to create an environment where memory can be located locally in big blocks of Hypertransport-connected DRAM or remotely in other DRAM blocks located in similar LAN-linked multicore servers. Here’s a block diagram of a local node in a 3Leaf server system.

 

 

There’s magic here, but it’s not a panacea. The TL1550 maintains cache coherency over a 16-node LAN but going over the network is still expensive. The TL1550 eases some of the memory latency by maintaining 144 Mbytes of local line and page caches for the local processors, but there’s still a network to cross sometimes.

Nevertheless, interesting stuff.


Related entries in: Computers, boards, buses | Microprocessors | 


Reader Comments


at 8/26/2008 1:58:35 PM, JustKidding said:
You mean the network really is the computer?! Who knew?

at 8/26/2008 3:52:34 PM, Steve Leibson said:
Actually, 3Leaf's take on it is "Memory is the Network." Some hardware engineers seem to have a lot of problem figuring out what's a processor, what's memory, and what's the network. Or maybe that's just marketing people in hardware companies that are confused. Or maybe it's both. Confusing, no?

at 8/26/2008 9:07:05 PM, DM said:
You aren''t going to hook say 10,000 processors together with 40GBE anytime soon, so more realistically you hook together bus-based clusters at 40GBE. Besides, suitable coherent caching can hide a lot of the latency. I used a distributed system with coherent shared memory model in the late 1980s. Back then the overhead was high, but programming vastly simpler than dealing with message passing. Plenty of research shows it is easier for programmers to write correct parallel programs using shared memory than using messages.

at 8/27/2008 9:33:02 AM, Steve Leibson said:
DM...If you would, please point us to the research you mentioned. It would be very interesting to me and to several of my regular readers. Thanks, Steve.

at 8/27/2008 6:23:51 PM, Dave J said:
Oh, I wasn't aware than anyone had ever written a correct parallel program. Seriously, shared memory may be easier to work with, definitely more intuitive. But to me, it probably leads to more incorrect programs than message passing. As for coherency over an outrageously slow bus (eg network), my opinion is "eh, why not?" For some applications, shared memory is necessary, but various threads seldom *actually* touch each other. That, combined with a clever hierarchical breakdown of memory and "ownership", so that relatively few status messages need to be sent tells me that this could indeed be useful for some problems.

Post a comment


Display Name

Before submitting this form, please type the characters displayed above:


ADVERTISEMENT

©1997-2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites