Leibson's Law: It takes 10 years for any disruptive technology to become pervasive in the design community. This blog is about the disruptive technologies that either have or will win over electronic engineers, some that won't, and why. Written by Steve Leibson, Tensilica's Technology Evangelist. See my history site at www.hp9825.com. You can email me by taking the first letter of my first name, appending that to my last name, then the magic email symbol, followed by the name of the company I work for, and then a dot followed by com.
Aug 26 2008 10:31AM | Permalink | Email this | Comments (5) |
Blog This! using: Blogger.com | LiveJournal |
Digg This | Slashdot This | add to Del.icio.us
Here’s a crazy idea: extend memory coherency for an SMP system across a LAN. Now why in the world would you do something so crazy? Well, it turns out that servers have a lot of trouble distributing loads and allocating resources. The result is low server utilization. SMP systems partially solve this problem by creating a pool of processors linked by coherent memory. Any processor can restart any stalled task because of the coherent memory. However, the number of processors in a cluster is limited and processors in other SMP clusters located in other parts of a server farm at the other end of LAN piping do not have a coherent-memory link so they cannot be easily used for load distribution without a lot of data movement across the LAN. That’s super slow.
So, the answer is simple. Just extend memory coherency across the LAN. That’s the solution that 3Leaf’s Krishnan Subramani and Shahe Krakirian proposed in their Hot Chips paper: Network Based Coherency: Extending a Processor's Coherency Domain over a Standard Network. Say what? To a dyed-in-the-wool microprocessor busmeister like me, this is heresy. LANs are too slow! Way, way, way, too slow!!!
Alas, my aged mental facilities and decrepit system-design skills are showing. I’m not looking at the whole picture. Here’s the slide that did it for me:

This slide shows that the latency through a 40Gbit/sec network switch is now on par with the read latency for a DRAM. Hmm. This idea doesn’t look so crazy after all. In other words—it’s so crazy it just might work.
This talk focused on 3Leaf’s TL1550 Coherent Network Controller, which exploits AMD’s Hypertransport protocol to create an environment where memory can be located locally in big blocks of Hypertransport-connected DRAM or remotely in other DRAM blocks located in similar LAN-linked multicore servers. Here’s a block diagram of a local node in a 3Leaf server system.
There’s magic here, but it’s not a panacea. The TL1550 maintains cache coherency over a 16-node LAN but going over the network is still expensive. The TL1550 eases some of the memory latency by maintaining 144 Mbytes of local line and page caches for the local processors, but there’s still a network to cross sometimes.
Nevertheless, interesting stuff.
Related entries in: Computers, boards, buses | Microprocessors |