Mike SantariniEDN Senior Editor Mike Santarini covers digital design and the EDA, ASIC, and FPGA industries. [Editor's note: As of Feb. 2008, this blog is no longer active and is presented here for archival purposes.]


Profile

RSS Feed

  • Add this blog to your RSS newsreader!

Recent Posts

Recent Comments

Most Commented On

Archives

By Category

IC Design Articles

Blog

Wednesday, December 5, 2007

Karma for MPUs: is chip binning burning up?

Dec 5 2007 11:31AM | Permalink | Email this | Comments (10) |
Blog This! using:  Blogger.com | LiveJournal |
Digg This | Slashdot This | add to Del.icio.us


A few weeks ago I attended a keynote at ICCAD in which, Jeff Welser, director of the SRC Nanoelectronics Research Initiative (NRI), outlined industry efforts to find a replacement for CMOS. Our coverage of that keynote is “CMOS running out of gas, new effort looks for scalable replacement, ICCAD keynoter says.”

In his presentation, Welser whipped through a plethora of fascinating, eye catching foils describing the innevidable fate of CMOS and the need for scalabe predecessor. But one of the foils that really caught my eye was a foil discussing the long practice of “chip binning” and how it is in jeopardy mainly because of transistor leakage issues.

Chip binning has always been fascinating to me on many levels. What is it? It’s essentially a practice in which chip manufacturers design a chip to hit a targeted speed grade, say for example 2GHz, but after the chips are manufactured and tested, manufactures find some of the chips perform at the targeted speed grade of 2GHz, some perform at higher than 2GHz, and even more perform at lower speeds than that targeted specification number (some of those lower performing chips may perform at 1.8 GHz, others at 1.5 GHz and some at 1 GHz…and lower).

But instead of throwing out the chips that didn’t hit the targeted performance specification, some semiconductor vendors, especially microprocessor vendors, sell most of them to us, the consumer. They simply put them in bins according to speed grade and price them accordingly. In processors for example, the processors that are the highest speed, essentially overclocked processors, traditionally sell for a premium and go into gaming machines. The ones that hit targeted performance go into high end home computing and business PCs. The ones that didn’t hit their targeted performance go into lower cost PCs and the very very lowest ones get thrown out. Very little is wasted. That’s one of the reasons processor companies do so well, they get to sell most of their inventories. Other types of chips, like ASICs, have to hit performance grades and meet system specifications or customers don’t buy them. But Ma and Pa consumer for the most part don’t even know about binnning.

As a consumer, I’ve always wondered if when you are buying a new PC, how do you know you are getting a processor that hit its target spec? And how do you know if you are getting a processor that badly failed hitting its performance target? For example, if you are buying a processor that runs at 2GHz, how do you know it wasn’t targeted at 4GHz and so you are buying something that was essentially 1 MHz from going into the trash bin. As a consumer, one has to wonder am I essentially buying a defective product? As a tech savvy consumer, one has to further wonder why didn’t the processor hit its target? As far as I know, MPU vendors don’t disclose any of this info to consumers.

But binning may be undergoing a bit of Karma. In Welser’s presentation, he briefly showed a foil in which he described how because of transistor leakage issues in bleeding edge processes, the binning process is in jeopardy. Essentially, what the foil showed was that because of leakage and more so the heat created by transistor leakage, manufactures are increasingly being forced to throw out the highest performing chips (those chips running above their specifications) from their wafer lots. Welser said that manufacturers fear the chips running at these highest clock rates will emit too much heat and will essentially burn themselves out after running at top speed. Replacing a defective product is extemely expensive and potentially embarrasing.

MPU vendors have traditionally made the most profit off of these highest performing chips so leakage in CMOS is a big big big deal and hitting their bottom lines. That’s why finding a way to squelch leakage or better yet finding a scalable alternative is a top priority for the industry.

As you probably know, the main reason MPU vendors went “multi-core” a couple years back was because tradional single processor core architectures were running into leakage/heat problems. Up until that time, THE race in MPUs was, and to a certain extent still is, in performance. MPU vendors pushed single core architectures up to around 4.5 GHz before they realized that leakage and associated thermal problems were too great and would cause failures and dreaded recalls.  So now, MPU vendors, and quite a few other chip disciplines are going multi-core…esentially, putting multiple lower performance processor cores on a single chip and creating architectures to allow the processor cores to evenly distribute computation workloads without any of the cores running too fast, creating too much heat, cumulatively, and burning up the chip.

The practice of multi-core seems to be working to sidestep the leakage thermal problem. But as CMOS continues to scale and seemingly becomes more leaky, even with new materials like high-k in the mix, the question are how long will it work and what will be the limitations? In short, is it a bandaid on hemeraging wound?

“Irregardless” (as my friends in Boston say), you can expect the practice of binning to continue. Yet, in the era of multi-core it may be harder than ever to determin if you are buying the best processor. Indeed underlying Welser’s talk is the point that the most bleeding edge processor may in the long run not be the best processor. Increasingly, consumers are probably going to need to consider what package and cooling is offered with a system as well as how many cores it has, what its performance is and how much onchip memory it contains. Certainly, if you fork out the cash to buy a screaming Alienware gaming machine running on THE latest and greatest processor, be sure not skimp on the cooling…you’ll likely need it. Just for kicks ask the salesman which bin the processor came from?


Related entries in: EDA | Semiconductors | 


Reader Comments


at 12/5/2007 12:56:29 PM, CYI said:
I wouldn't call high-k a "bandaid on a hemorrhaging wound". It's more like a blood transfusion for a critically ill patient. There are multiple types of leakage, the two most important being subthreshold leakage and gate leakage. At 90nm, 30% of a chip's power consumption is due to leakage. Of that 30%, almost all is subthreshold leakage. At 65nm, over 50% of a chip's total power is due to leakage. Of that, 60-70% is subthreshold and the remainder is gate leakage. At 45nm, gate leakage would overtake subthreshold and become the dominate component of leakage. High-k dielectric materials virtually eliminate gate leakage. However, subthreshold leakage will continue to be a critical parametric yield-limiting factor at 45nm and beyond.

at 12/5/2007 1:31:34 PM, Mike Santarini said:
Good points CYI, I skimmed over the types of leakage but you make some great points. Still 60 to 70%, and seemingly continuing to get worse at lower processes, is a lot of leakage (AND HEAT). How about we mix our two metaphors and go with "giving a blood transfusion to a patient who is hemorrhaging?” high-k isn’t curing the problem entirely…the patient is still, well, leaking.

at 12/5/2007 3:00:52 PM, CYI said:
That''s a good metaphor, and yes, the patient is still leaking. There are some very effective ways to control leakage. The most popular is multi-Vt libraries which are used on all low-power devices today. In addition, there are techniques such as header/footer sleep switches, voltage islands, transistor body biasing, and state assignment in stand-by mode. However, some of these techniques can only be implemented during the architectural and logic design of a chip. So, they can only be used on brand new designs that are being started from scratch. They can''t be used on re-spins or on chips that are already in process. I agree with you that binning will continue, and for many designs, subthreshold leakage is an un-healed wound and will continue to bleed.

at 12/6/2007 3:39:09 AM, TBD said:
Could Transmeta's LongRun2 IP Blocks offer enough of a solution to extend the life of CMOS into the lower geometries? They seem to have a viable solution for subthreshold leakage.

at 12/6/2007 9:28:57 AM, Zeno said:
TBD, We have seen a reduction in Vth leakage using LongRun on our 65nm test chips. I can't give specifics yet but we should be capable of delivering CMOS chips for several years to come at 22nm and beyond. We primarily used body biasing in our approach.

at 12/6/2007 5:21:56 PM, Hank Walker said:
The other obvious factor that will lead to the death of binning is intra-die process variation in many-core MPUs. One could adjust supply or bias to speed up slow ones and slow down fast ones, but why waste that effort, when throughput will be more important, and the OS may migrate tasks among cores to even out the power anyway.

at 12/7/2007 2:49:12 AM, Lonnie Burk said:
I like the metaphors - bleeding victims with bandages and transfusions to keep them alive whilst they still bleed. As in medicine, the search for a cure continues. Maybe its time to move on to photonics (using light) instead of electronics to obtain faster, cooler, no-bleeding alternatives! It seems to me to keep it simple is the solution - you know - glue to leak shut then weld it closed permanently!

at 12/10/2007 2:55:46 AM, YVdp said:
Lonnie, your wish is (at least partly, i.e. regarding intra-chip communication) on the way: "New IBM Research Technology Could Enable Today''s Massive Supercomputers to be Tomorrow''s Tiny Computer Chips" www-03.ibm.com/press/us/en/pressrelease/22769.wss

at 12/14/2007 6:25:36 AM, markd said:
"For example, if you are buying a processor that runs at 2GHz, how do you know it wasn’t targeted at 4GHz and so you are buying something that was essentially 1 MHz from going into the trash bin." The flip side is where the overclockers live- how do you know that the 2GHz CPU you just paid cheap money for won't run at 3.999GHz, with good cooling?

at 12/14/2007 11:51:20 AM, Mike Santarini said:
Thanks for reading folks and for some very great points. I can’t help but think that relying on OS fixes to manage the power issues will be a bit difficult because there are many processors and configurations, and, as most of us who braved buying Vista early know, some OS providers have problems simply creating applications that work—relying on them to do power management sounds scary. Also, I think if you are savvy enough to overclock a CPU, you really have to ask yourself why didn’t the CPU make it to the targeted speed in manufacturing (if you have some means of knowing it in the first place)? If the CPU had a defect for example that didn’t allow it to make its targeted speed grade, will overclocking it (in turn cranking up the heat) turn a small problem into a catastrophic one?

Post a comment


Display Name

Before submitting this form, please type the characters displayed above:


ADVERTISEMENT

©1997-2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites

ADVERTISEMENT
You will be redirected to your destination in few seconds.