Columnists

IDF debriefing: Pendulum swings from AMD to Intel

AMD has done incredibly well. But its much larger competitor has completed its turn and is bearing down.

By Brian Dipert, Senior Technical Editor -- EDN, 3/12/2006

Moore's Law, as originally elucidated by its creator, dealt exclusively with the number of transistors that one could integrate on a given-sized piece of silicon, and the rate at which that number increased over time. Other, lesser pundits have subsequently extrapolated Moore's Law to statements concerning the switching speed and power consumption of transistors over time. Intel and other silicon suppliers' struggles with leakage current below the 130-nm process node have highlighted the folly of such extrapolations. Yet, in its purest form, Moore's Law continues to hold true. And the Spring 2006 Intel Developer Forum, from which I've just returned after three days' worth of information-filled, brain-crushing keynotes, classes, and meetings, provided clear evidence of that fact.

Judging both from some of the feedback I've received to my past writings in EDN and my blog, and from the comments I've seen on other blogs, folks tend to prefer to put semiconductor suppliers in neat-and-tidy "right" and "wrong" boxes. IMHO, this is a naive and overly simplistic view of the world.

The chip industry has plenty of smart people. They all get the same sort of feedback from their customers and potential customers. And, based on the resources they have at their disposal, combined with their views of how their customers' market might evolve in the future, they make appropriate decisions on next-generation chip (and chip family) attributes. (My past experience at my prior gig made it clear to me that the vision of most system designers doesn't extend beyond the next project or few, mind you for perfectly understandable reasons. And a chip architect also has the advantage of being able to consolidate and weigh feedback from multiple sources.)

Let's now narrow the previous paragraph's big-picture view a bit and focus in on AMD and Intel's respective CPU architectures (past, present, and near future). I'm not going to attempt to give you a news report on what happened at IDF. Plenty of other folks have already done so (as a jumping-off point, I'd suggest you visit AnandTech, Ars Technica, Electronic News and ExtremeTech). And I'm also not going to give you an in-depth discourse on Intel's new Core microarchitecture; for that, you'll need to stay tuned for my April 27 feature article. However, I want to share with you the one big "a-ha" that I got from the show, stimulated by a single comment from an Intel Fellow during one of the lectures. Then I'll extrapolate from that point to a broader view of the AMD-versus-Intel struggle.

A microarchitecture (that is, a structural implementation of a processor instruction set) is a big deal. It requires many chip architects and many design engineers (as well as process engineers, test engineers, reliability engineers, manufacturing engineers, etc) and many years to implement. As such, for better or worse, a company is stuck with a chosen microarchitecture for quite a long while, even a company with Intel's abundant resources. Key up-front microarchitecture decisions also have to be made before all the details of the semiconductor process they'll be implemented on, for example, are known. Many years ago, at a meeting deep in the bowels of Intel (a meeting for which I so wish I could have been a fly on the wall), the company made the critical decision that it would be able to continue increasing its transistors' historical switching speed ramp at, and beyond, the 90-nm process node.

In retrospect, that was perhaps one of the worst predictions Intel ever made. And evidence of the decision is obvious in many aspects of the NetBurst microarchitecture(which the company initially stated would scale up to 10 GHz speeds) that formed the foundation of the company's Pentium 4 (and derivative Celeron and Xeon) CPUs. Look, for example, at NetBurst's extremely long 20- (and later 31-) stage pipeline. Intel likely assumed that high clock speeds would mask the effect of branch mis-prediction. Or, look at the double-clocked ALU. Or look at the pseudo-multicore HyperThreading feature, which in some code-execution scenarios enabled simultaneous processing of integer and floating-point threads by tapping into otherwise-unused CPU execution units. The downside of HyperThreading is the contention overhead when multiple threads attempt to access the same chip resources. High clock speeds would have masked this contention overhead, but Intel couldn't exploit them, leading to the fact that systems often run faster (specifically on traditional, common, single-threaded and integer-heavy code) when HyperThreading is disabled.

Once a company has headed down a particular microarchitecture path, it's got a limited set of options available to improve its fortunes until the next microarchitecture is ready. Intel rode the Pentium 4 through four process iterations; 180, 130, 90, and, beginning late last fall, 65 nm. Arguably this was a few too many. In the late '90s, Intel was busy broadening its semiconductor reach via a flurry of dot-com fueled acquisitions, and I'd posit that Intel inappropriately took its eye off the CPU architecture ball in the process.

At 3 GHz, Intel began to clearly discern the rapidly approaching, leakage-current-induced power "wall" that it was about to hit. Chief Technical Officer Justin Rattner commented during his Tuesday IDF keynote, for example, that beyond a certain point, a 20% increase in clock speed translated to a 70% increase in power consumption.

Although more advanced semiconductor lithographies failed to enable Intel to continue scaling up clock speeds, they did enable higher transistor count integration on a given-sized die. Intel harnessed this capability to bolt ever-increasing amounts of L1 and L2 cache onto NetBurst. Whereas the initial Willamette version of the Pentium 4 incorporated 8 kbytes of L1 and 256 kbytes of L2, the latest-generation Presler core includes 16 kbytes of L1 and 2 Mbytes of L2. And although intrachip clock speeds tailed off over time, interchip clock speeds, specifically the front-side bus speed, have marched upward relatively smoothly. Specifically, Willamette's font-side bus ran at 400 MHz, whereas Presler runs at up to 1066 MHz.

While Intel was struggling with NetBurst, the company was simultaneously crafting the Pentium III-derived Pentium M (aka Banias) "Mobile" microarchitecture, which has formed the foundation of the follow-on Dothan and Yonah processors and even the blade-server-targeted Sossaman. Moreover, as soon as the Banias architects in Israel wrapped up their evaluations, Intel immediately set them to work on what we now know as the Core microarchitecture, which will, in its initial incarnation, appear in the server-targeted Woodcrest, desktop-targeted Conroe, and mobile-targeted Merom spins. Conroe and Woodcrest are scheduled to ramp into production in the third quarter, with Merom following by year's end. Quad-core, dual-die CPUs (Clovertown for servers, and Kentsfield for desktops) also received demos at IDF. I think it's highly notable, by the way, that the last two Intel microarchitectures have come out of Intel's Israel operation, versus the company's historical brain trusts in Santa Clara and Oregon.

A-ha moment

So what's the basis of my "a-ha" concerning Intel's relative position versus AMD in the years to come? It arrived mid-week during one of Intel Senior Fellow Steve Pawlowski's Core microarchitecture presentations (Intel's red-shirted Fellows were very evident at IDF, in formal presentations, informal chalk-talks, and impromptu hallway conversations), when he hit head-on the fact that Core still doesn't integrate a DRAM controller as AMD's Athlon 64 CPUs have done for several years now, nor does it have a dedicated intercore communication bus. Pawlowski pointed out that Core would have a 1066 or 1333 MHz front-side bus, along with abundant on-chip cache (2 or 4 Mbytes of core-shared L2 in the initial implementations). Coupled with intelligent use of that cache, and intelligent access prioritization of external DRAM when necessary, Pawlowski's analysis suggested, in the vast majority of usage scenarios, there was plenty of available front-side bus bandwidth to service intercore communication requests in a speedy manner.

Is Pawlowski right? Initial indications are that Intel's back in full stride after several years of NetBurst-induced stumbles in the desktop and server markets. Take a look at this AnandTech writeup from last week (a follow-on report addressed readers' issues with the initial analysis). Realize that, in this study, a pre-production Conroe, running at the low end of the CPU- and DRAM-speed spectrum that the full production family will offer, was benchmarked against an overclocked, state-of-the-art AMD competitor.

Unless…

a)The specific workloads tested by AnandTech don't hold up when extrapolated to other common usage scenarios (doubtful, but possible) or

b)Intel pulled the wool over AnandTech's eyes (again doubtful, both because Anand Lal Shimpi is no fool, and because Intel's already tarnished reputation would be KO'd by any uncovered irregularities)

…Intel's got a winner on its hands.

As should be the case. Process shrinks, front-side-bus speed-ups, and cache enhancements are baby steps compared with the significant performance gains possible with a successful ground-up microarchitecture redesign. AMD's Athlon, in all of its proliferations over the years (including Opteron and Turion), has obvious roots in the K6 launched in 1997. Everything subsequent to that leap, including both 64-bit support and multicore, was a comparative baby step. AMD is undoubtedly working on a next-generation microarchitecture. And when the company finishes it, it will have the chance to retake the lead. But until then, the CPU race has finally gotten interesting again.

Bigger picture

Which leads to my bigger-picture hypothesis: that Intel and AMD differ in at least three key attributes. The first is manufacturing-process maturity. About a decade ago, due in no small part to the influence of Craig Barrett, Intel expanded its expertise beyond chip design to include high-volume-manufacturing leadership (the so-called "copy exactly" fab network being one example). Some might claim that the company's 90-nm stumble runs counter to that observation, but I'd beg to differ. I suspect that the process came up just fine from a defect-induced yield standpoint, but that unexpected transistor-leakage-current issues delayed the production ramp (Am I splitting hairs? Maybe).

Regardless, the company's had multiple chips (Presler and Yonah's various proliferations) in high-volume production on its internally developed 65-nm process since last fall, and the Core derivatives will ramp in the coming months. In contrast, my read of the tea leaves suggests that AMD won't begin ramping 65-nm technology into production (via a process developed in partnership with IBM and Infineon), until the end of this year best-case.

All those extra transistors available to Intel enable it to boost cache memory sizes, but they also lead to plenty of logic innovations, such as the following Core features (referred to by their marketing names):

a)Wide Dynamic Execution: the ability for the processor to not only fuse multiple micro-operations into one, but also multiple instructions into one (the commonly-touted "if-then" C structure, for example, would normally split into separate, dependent compare and conditional-jump instructions). Combined with a four-wide instruction queue, macro-fusion enables the processor to tackle up to five instructions within a single clock tick.

b)Intelligent Power Capability: an expansion of the transistor budget partitioning and other power management innovations first seen in the Pentium M, which focus on performance-per-watt, not simply performance. For example, Rattner's keynote pointed out that Intel designers have two fundamental logic transistors available to them at 65 nm, one with 5x lower leakage than its 90-nm predecessor, and one with 20% higher switching speed. I heard similar data during Intel's presentation at ISSCC last month.

c)Advanced Smart Cache: Enables a single core, if the situation warrants, to allocate the entire L2 cache to itself, as well as enabling efficient sharing of L2-housed data between cores.

d)Smart Memory Access: Out-of-order-cogniscent (speculative load-before-store, etc) external memory-access controllers and pre-fetchers that optimize front-side-bus traffic, mask external memory latency, and are tailored to DRAM idiosyncrasies such as the memory's paged structure and sequential burst buffers.

e)Advanced Digital Media Boost: A fancy label which essentially means that Intel's replaced the double-clocked, power-hungry Pentium 4 ALU with the ability to single-clock-execute 128-bit SIMD integer and double-precision floating-point operations.

In contrast, AMD didn't have Intel's transistor (specifically cache) budgets to work with when it was developing the Athlon 64. So the decision to integrate a DRAM controller, thereby getting the slow DRAM subsystem as close as possible to the CPU core that was mostly accessing it, was an obvious one. From a raw CPU-performance standpoint, such an approach has advantages. But it also has downsides. DRAM architectures evolve much faster than companies can crank out new DRAM controller-inclusive CPU designs, a lesson that Intel learned with its ill-fated, Rambus-DRAM-cogniscent Timna CPU (and a lesson that the company probably still remembers and is loathe to repeat). And in low-end AMD-based PC designs that employ graphics-inclusive core-logic chipsets, the graphics processor has to go through the CPU to access the main memory (which in this configuration doubles as the frame buffer). Core logic containing the DRAM controller doesn't incur this overhead.

The second key difference between AMD and Intel is their relative sizes. AMD's fourth-quarter 2005 sales were $1.84 billion. Intel's fourth-quarter 2005 sales were $10.2 billion. One could reasonably assume, I think, that there's some relationship between revenue and headcount. Intel's riding three fundamental x86 CPU architectures right now; the Pentium M, Pentium 4, and Itanium (the latter accomplished via a combination of hardware and software emulation). AMD's a one-trick pony, in comparison; Opteron and Turion are, fundamentally, Athlon 64, respectively with multiprocessor and hardware-based power-management tweaks. Intel's gambling that Core will enable it to slim its architecture portfolio down to two (still including Itanium) within a year or so, but unless Core is a resounding failure and recent market-share trends continue, AMD will still need to leverage a single architecture foundation.

The final key difference between the two companies is historical and, I'd argue, dissipating. When AMD launched Athlon in 1999, its overall CPU market share was barely over 10%. Now, its market share is roughly twice that size. Back in the late '90s, the company was able to "roll the dice" and take chances, because any resultant stumbles weren't so evident; some of you may remember the flurry of CPU and chipset bug reports that marked the first few years of the Athlon era, but I bet many of you never heard about them. Comparatively, Intel's 80% or better market share has meant that it has had to move slowly and cautiously, as any miscues are widely felt and reported. This is why, for example, although Core-based CPUs were demonstrated at the last IDF, in August 2005, they won't be in production until roughly a year beyond that first public unveiling. However, AMD is now big enough that it'll also need to move in a more moderated manner, lest it run afoul of both angry customers and investors.

Some of you will, after reading this, inevitably label me an Intel shill, perhaps biased by my employment history. That's an unavoidable consequence of writing for people who like to put semiconductor suppliers in neat-and-tidy "right" and "wrong" boxes. I clearly don't see myself as a shill. But I do view myself as pragmatic, and I've seen a few architectural shifts in my 21 years in this business. We're in the midst of one, and barring any showstoppers, the pendulum will swing Intel's way, at least for a while. AMD's not "wrong." The company has done incredibly well, in fact, given its comparative resource constraints. But its much larger competitor has completed its turn and is bearing down. I can't wait to see AMD's next maneuver.



Reed Business Information Resource Center

Featured Company


Related Resources

ADVERTISEMENT

ADVERTISEMENT

Related Content

 

By This Author


ADVERTISEMENT

Knowledge Center



Technology Quick Links

EDN Marketplace


©1997-2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites