Subscribe to EDN

If ever EDA needed a ($700M) proof point on their value...

February 2, 2011

As I reported yesterday, Intel announced that a “design error” in a SATA I/O support chip for the Sandy Bridge processor would cause them to respin the design… at a cost of $700M!  From the information that Intel provided, it was apparent to me that the problem was most likely a voltage domain error, i.e. a low voltage device got accidentally hooked up to a higher voltage supply than it was spec’ed for.

A report on the internet today, if it is credible, confirmed my speculation:

quoting Intel’s Steve Smith (VP and Director of Intel Client PC Operations and Enabling) : The problem in the chipset was traced back to a transistor in the 3Gbps PLL clocking tree. The aforementioned transistor has a very thin gate oxide, which allows you to turn it on with a very low voltage. Unfortunately in this case Intel biased the transistor with too high of a voltage, resulting in higher than expected leakage current. Depending on the physical characteristics of the transistor the leakage current here can increase over time which can ultimately result in this failure on the 3Gbps ports.

Bingo! Exactly as I suspected.  Intel’s comments yesterday:

  1. Problem was “statistical”
  2. Performance degrades over time.
  3. The “error” can be fixed by an upper-layer metal mask patch.

In a former life, I was a product marketing  manager for two tools that were designed specifically to find problems like this.  Mismatched voltage domains is, unfortunately, one of the most common causes of respins in the books. And, sadly, so easily prevented!

There are simple static ERC (electrical rules checkers) that can test every transistor instance to find any devices with low voltage models that are hooked up to supply rails that exceed their rating. List price of these tools?? About 0.01% of what this will cost Intel. (Add a couple more decimal places with Intel’s discount).

The problem manifested itself as degradation over time, and it was “statistical”, i.e. not deterministic.  This sounds like NBTI, in which PMOS devices degrade randomly. It could also be HCI, in which NMOS devices experience charge trapping that alters their threshold voltage over time.

To test this effect EDA vendors have added features to circuit simulators that can reproduce “aging”. Work on these techniques began more than twenty years ago.

This design error may not have been human error at all. It could have been an error produced by an auto-router that hooked up the bad transistor. In any case, it’s not a “design error”, it’s a verification methodology fault.

I bet that gets fixed real quick too!

Posted by Michael Demler on February 2, 2011 | Comments (18)

August 27, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Genevieve commented:

A million thanks for ptosnig this information.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Nitin Deo commented:

Just read this article and Mentor’s blog about the new ERC tool. Is this too much of a coincidence or am I going too wild with my imagination?

I wonder if Intel uses internal ERC or commercial ERC tool?


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Matt Hogan commented:

I would agree with PD Dude that traditional ERC methods don’t work well for the type of problem described in this article. To catch this type of error you need to be able to identify classes of circuits, (thin oxide PMOS, for example), appropriately identify the voltages for the pins on that device, and compare them against the specific rules for that device and voltage domain. We have been working with customers on multi-power domain ERC requirements such as these using circuit and layout information to develop more sophisticated reliability checks. I’ve posted more information about ERC checking on my blog on the Mentor web site.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Matt Hogan commented:

I would agree with PD Dude that traditional ERC methods don’t work well for the type of problem described in this article. To catch this type of error you need to be able to identify classes of circuits, (thin oxide PMOS, for example), appropriately identify the voltages for the pins on that device, and compare them against the specific rules for that device and voltage domain. We have been working with customers on multi-power domain ERC requirements such as these using circuit and layout information to develop more sophisticated reliability checks. I’ve posted more information about ERC checking on my blog at blogs.mentor.com/matthew_hogan/


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
charly commented:

I agree to sit in a plane, knowing the human pilots are reading the newspaper while the autopilot brings us close to the target airport. I don’t agree with a fully automated landing.

What about a design review checklist at the end of the project? For any block interface issue, this helps a lot!


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Esko Mikkola commented:

Ridgetop Group’s Sentinel Silicon test structures are currently used to generate accurate aging simulation models for 32 - 65 nm CMOS.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Jeremy commented:

When you rely to much on technology you lost the need to fully understand what you do. The machine will tell me if I’m wrong… It’s true in a multimillon company, It’s also true in every day live with GPS and cell phone people do not plan their trip anymore. So they end up lost more offend then before…


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
EDA Engineer commented:

Appropriate use of EDA could have found & corrected this bug, sure. But Intel, along with most of the other big electronic companies, spend more time & effort *not buying* EDA tools. I’d almost bet that Intel’s stationary expenses dwarf its EDA spend.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
chipwiz commented:

Garbage in garbage out! There will be no verification flow that will catch all in any circumstance. Contrary to your ideology of a need for better verification, I would suggest designers do a better job at weeding the bugs first and not overly rely on some tool down the stream that will catch all for them. Individual designers must take the responsibility that they do not introduce bugs. Too many designers are relying on tools and are lost in determining if their design will work without them. As described this sounds like the wrong choice of transistors (thick oxide as opposed to thin oxide). If any of the designers had some experience with ECL design they would never have made such an error.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Bob Colwell commented:

” In any case, it’s not a “design error”, it’s a verification methodology fault.” I don’t agree. Something has to be designed first before it can be validated or verified. There was a human responsible for the circuit in question; whether they designed this circuit by hand or used automation, they are still responsible. That was where the first mistake was made. Validation/verification was then performed, which obviously failed to catch this error; that was the second mistake. That’s about all one can say without insider knowledge, which none of us commenting on this issue have. Beyond all that, mistakes happen, always have, always will. The important thing is to learn from them and not make the same one twice.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Don L. commented:

all you can here in the halls here is “….NBTI…this “, “…STI… that”; “QT issues…there”;

“…what will metal fill do to us?; placing dummy devices to eliminate shadows…”

EDA does not make those conversations happen; engineers do. Unfortuntely Intel did not have the right engineers doing that job.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Larry M commented:

The respin doesn’t cost the entire $700M. A good chunk of that is the recall. Of course that doesn’t change the outcome.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
PD Dude commented:

My assumption is its a badly connected bulk node on a PMOS in a PLL, right?

I’d be curious to hear how people believe this should be caught, with some specifics.

I have talked with numerous engineers and while their are ways to catch it, most of them require some data prep, and if you have bad data input, you will not catch it.

For a simple single voltage domain digital circuit, this stuff works great. But most ERC decks I’ve seen don’t work well on analog and either require alot of help, or produce tons of false positives. We’ve been telling verification companies this for 10 years. But I guess DFM and fast DRC times sound more exciting. Maybe now someone will spend some time on a better ERC solution for real mixed signal SOCs.

It would make for a great DAC demo. Get Synopsys, Mentor, Cadence and Magma up there and have them tell me specifically how their ERC tool would have caught this problem and why Intel didn’t catch it.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
EDA Exec commented:

Make that a $1B proof point … they also lost $300M in revenue.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
DM commented:

I agree this is a verification flow problem, since a simple static ERC would have flagged the mismatch, mixed signal or not. I’d say the EDA group owns this mistake, not the designers - assuming the designers ran the tool flow.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Kev (simguru) commented:

Modern chips are no longer digital or analog they are mostly mixed signal. There is no good mixed-signal design methodology - i.e. something that allows you to do digital verification with the analog stuff modeled properly.

So I’m not surprised stuff fails, I’m more surprised that it ever works.

This is a fixable problem, but it would require the analog and digital guys in the EDA companies actually integrating their tools, but the analog and digital guys in EDA don’t communicate any better than in the design community.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Linda Capcara commented:

Thanks for the detailed explanation Mike. It makes me wonder if the Intel Storage Group GM is getting the heat.


April 14, 2011
In response to: If ever EDA needed a ($700M) proof point on their value...
Engineer commented:

Interesting!!

POST A COMMENT
Display Name
captcha

Before submitting this form, please type the characters displayed above. Note the letters are case sensitive:

Advertisement
Advertisement
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows