Brian DipertEDN Senior Technical Editor Brian Dipert exposes, analyzes and
opines on diverse topics in technology. Follow the Brian's Brain Twitter feed at www.twitter.com/BrianzBrain.



   Advertisement

Profile

RSS Feed

  • Add this blog to your RSS newsreader!

Recent Posts

Recent Comments

Most Commented On

Archives

By Category

Consumer Electronics Design Articles

Blog

Thursday, August 21, 2008

The 2008 Intel Developer Forum: Nehalem's Impending Advent

Aug 21 2008 10:27AM | Permalink |Email this|Comments (2) |


One in a series of posts

Greetings from San Francisco; I've been here since Monday evening. Innumerable 1:1 meetings, group technical sessions and keynotes stretching from early morning to late night the past few days, coupled with the inevitable need for at least a few hours' sleep per night, have delayed my coverage of this august yearly industry forum. But don't worry; you can expect plenty of catch-up commentary in the days to come!

I'll begin by providing some thoughts on Nehalem (i.e. Core i7), Intel's next-generation micro-architecture (coming on the heels of today's Core micro-architecture-based 45nm Penryn CPUs and their 65 nm Woodcrest-for-servers, Conroe-for-desktops and Memom-for-mobile predecessors) and one of the 'trophy' products at this year's conference. Nehalem was predictably showcased in several executive keynotes; I also attended a number of large-group technical sessions on the architecture and products based on it, along with a private briefing meeting in advance of receiving review hardware (which I should have in my hands 'soon'…sorry, I can't be more specific right now).

Begin, please, by reading over Ann Mutschler's excellent overview writeup published yesterday on EDN's website. I've already touched on many of Nehalem's key features in past coverage:

  • Per-core 32 KByte L1 instruction and data caches
  • A per-core 256 KByte L2 cache
  • Core-shared L3 cache whose size varies depending on the number of on-die cores and the targeted price-vs-performance point for each product. This scheme is an expansion of the shared-L3 cache approach premiered in the Penryn-based Dunnington six-core processor for servers, which is now ramping into production.
  • The resurrection of HyperThreading (i.e. the ability to, in some circumstances, simultaneously process two instruction threads per core)
  • A dedicated (and AMD HyperTransport-reminiscent) core-to-core and CPU-to-CPU interconnect scheme formerly known as CSI (Common System Interconnect) and now as the QuickPath Interconnect
  • On-chip DRAM controllers (again reminiscent of a longstanding AMD feature), interestingly supporting three memory channels, which Intel Fellow Glenn Hinton confirmed to me on Tuesday represented the result of a performance-vs-pincount tradeoff analysis, and which resulted in some head-scratching chip design challenges due to its non-binary (i.e. 2, 4, 8…) channel-count nature.

Interestingly, the per-core L1 and L2 cache allocation is decreased with Nehalem as compared to Penryn, counterbalanced by Nehalem's shared L3 cache pool and much faster access to main memory by virtue of the on-chip DRAM controllers. Keep in mind that Nehalem and Penryn both derive from the same 45 nm process technology; it's interesting to see how the company chose to allocate a similar transistor budget in each case.

In 2001, Intel's then-COO (now CEO) Paul Otellini championed a corporate strategy shift which became known as the 'right hand turn'; a defocus on clock speed improvements in favor of per-die CPU core increases coupled with healthy paranoia concerning power consumption. That latter concern is clearly evident in Nehalem. For one thing, the company chose to revisit the 8T (8 transistor) SRAM cell for Nehalem's L1 cache, a design technique that had largely fallen out of favour in the industry in recent years. As I wrote way back in 1999, for example, 8T cells have lower standby power requirements than 6T2R (six transistor, two resistor) alternatives, but this current consumption stinginess comes at die size and speed impacts, which Intel counteracted via an intensive focus on circuit design and layout. Equally notable, Nehalem is a fully static design, absent the dynamic 'domino logic' AND gate circuits used in recent years for transition detection, for example. Again, there's a performance tradeoff to this static CMOS revisit, but based on the data I saw this week, it was surmountable.

Speaking of power consumption, I'd also like to say a few words about Nehalem's 'Turbo Mode", which the company unveiled Tuesday during Pat Gelsinger's keynote. To some degree, it's nothing new; the company and its competitors have all been implementing clock gating circuits for years to squelch transistor-switching power consumption at times when certain areas of the chip aren't being used, and the Mobile Penryn processor was the production test-bed for the technique of redirecting a CPU's total available power budget towards one core via dynamic over-clocking at times when the presence of single-threaded code meant that the other core was not being used.

What's new with Nehalem reflects tight synergy between the company's product and process development teams, a partnership which both Gelsinger and Intel Fellow Rajesh Kumar emphasized several times was only possible for companies with 'in-house design and process technology' (a not-so-subtle jab at competitors who use foundries such as TSMC and UMC, as well as a slam on AMD's pending 'asset-lite' strategy). It's "Power Gate", a transistor design technique which more completely shuts off current flow to various areas of the chip on a dynamic basis modulated by usage need, thereby not only eliminating switching power consumption (as clock gating already did) but also leakage power draw.

"Power Gate" accompanies other on-chip technologies with similar miserly aspirations, such as varying Vt (threshold voltage) for various circuits to optimize speed-vs-current draw, and a diversity of thermal sensors scattered across the die. Orchestrating the entire power consumption symphony is a 1 million transistor dedicated controller core…to put its complexity in perspective, its transistor count alone exceeds that of an entire Pentium III microprocessor introduced less than a decade ago. If that's not a poster child for Moore's Law, I don't know what is…

In my NDA briefing session on Tuesday, I saw (and briefly played with) a diverse suite of Nehalem-based systems stretching from single- to multi-CPU configurations, and running an equally diverse mix of applications. Intel also confidentially shared with me some initial Penryn-vs-Nehalem benchmarking results that were quite mind-blowing. The real proof (or not) will come, of course, when my own reference hardware arrives (and when the NDA lifts, so I can share results with you all). But based on what I've seen this week, Intel's got another winner on its hands. Does AMD have any chance of credibly responding at this point, save perhaps via the courts?


Reader Comments



at 8/22/2008 11:40:22 AM, Buddy said:
Historically Intel was first with integrated memory controller for x86 processors. The 386EX for embedded and later the scrapped Tinma in the 90s. AMD's Hypertransport + DDR IMC was derived from EV6/EV7 bus IPs which Intel owns, but free license to AMD (resulting from tech-monopoly suit/complain by AMD).



at 8/22/2008 11:53:43 AM, Brian Dipert said:
Dear Buddy, along with the 386SL and 486SL. I remember them all well!

Post a comment



Display Name

Change Image
Before submitting this form, please type the characters displayed above.
Note the letters are NOT case sensitive.


ADVERTISEMENT

©1997-2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites