Subscribe to EDN

Process variation: you can't ignore statistics any more

March 31, 2009

I like to say that “you can’t ignore the physics any more” to point out that we have to worry about lots of physical effects that we never needed to consider. But “you can’t ignore the statistics any more” would be another good rallying cry.

In the design world we like to pretend that the world is pass/fail. If you don’t break the design rules your chip will yield. If your chip timing works at the worst case corner then your chip will yield (yes, you need to look at other corners too).

But manufacturing is actually a statistical process and isn’t pass/fail at all. One area that is getting worse with each process generation is process variability especially in power and timing. If we look at a particular number such as the delay through a nand-gate then the difference between worse-case and typical is getting larger. The standard-deviation about the mean is increasing. This means that when we move from one process node to the next, the typical time improves by a certain amount but the worst-case time improves by much less. If we design to worst-case timing we don’t see much of the payback from the investment in the new process.

An additional problem is that we have to worry about variation across the die in a way we could get away with ignoring before. In the days before optical proximity correction (OPC) the variation on a die were pretty much all due to things that affected the whole die: the oxide was slightly too thick, the reticle was slightly out of focus, the metal was slightly over-etched. But with OPC, identical transistors may get patterned differently on the reticle, depending on what else is in the neighborhood. This means that when the stepper is slightly out of focus it will affect identical transistors (from the designer’s point of view) differently.

Treating worst-case timing as an absolutely solid and accurate barrier was always a bit weird. I used to share an office with a guy called Steve Bush who had a memorable image of this. He said that treating worse case timing as accurate to fractions of a picosecond is similar to the way the NFL treats first down. There is a huge pile of players. Somewhere in there is the ball. Eventually people get up and the referee places the ball somewhere roughly reasonable. And then they get out chains and see to fractions of in inch whether it has advanced ten yards or not.

Statistical static timing analysis (SSTA) allows some of this variability to be examined. There is a problem in static timing of handling reconvergent paths well, so that you don’t simultaneously assume that the same gate is both fast and slow. It has to be one or the other, even though you need to worry about both cases.

But there is a more basic issue. The typical die is going to be at a typical process corner. But if we design everything to worst case then we are going to have chips that actually have a much higher performance than necessary. But now that we care a lot about power this is a big problem: they consume more power than necessary giving us all that performance we cannot use. There has always been an issue that the typical chip has performance higher than we guarantee, and when it is important we bin the chips for performance during manufacturing test. But with increased variability the range is getting wider and when power rather than timing is important, too fast is a big problem.

One way to address this is to tweak the power supply voltage to slow down the performance to just what is required, along with a commensurate reduction in power. This is called adaptive voltage scaling (AVS). Usually the voltage is adjusted to take into account the actual process corner, and perhaps even the operating temperature as it changes. Once this is done then it is possible to bin for power as well as performance. Counterintuitively, the chips at the fastest process corner may well be the most power thrifty since we can reduce the supply voltage the most.

Posted by Paul McLellan on March 31, 2009 | Comments (6)

March 26, 2010
In response to: Process variation: you can't ignore statistics any more
Weight Loss Tips commented:

Mind Wall,settle quick candidate result concentration page steal contract easy night lead straight standard so still failure complex seat species industrial goal sale before commission hurt amount wife to vary once death comment various quickly influence section content along late place value shoulder neighbour remember hide entitle disease advantage half warm strike planning division external text pair plant responsible blood relate shoot cash kitchen definition system prospect towards river achievement there aid recall try theory town mental page responsibility sorry version chemical heavy warn dangerous cut success our treaty according national home


April 10, 2009
In response to: Process variation: you can't ignore statistics any more
edasemiguy commented:

Don't forget analog. Your article equally applies to analog/mixed-signal design. Statistical variation causes spec failure or overdesign because process corners provided by the fab alone don't accurately bound the process variability. Systematic effects popup in sub65nm analog too.


April 1, 2009
In response to: Process variation: you can't ignore statistics any more
SteveM commented:

Notice that all of the discussion is about SSTA, that is statistical timing analysis. Yet all of the upstream optimization from DC-T through ICC is all completed with MCMM (multi-mode, multi-corner). What use is analysis if you cannot take advantage in optimization to make the circuit smaller or lower power. The possible advantage of SSTO is to perform additional leakage optimization yet needs to be compared with MCMM in terms of the leakage power effectiveness. All initial studies result in low single digit percentage leakage power improvement. So the conclusion, don''t bother. Put the efforts into system level and architectural power as the recent Intel Xeon (Nehalem) announcement demonstrates. Also as everyone knows Intel uses binning so they are not designing at worst corner as most of the fabless guys are forced by their foundries. The other side of the story is the reluctance of foundries to deliver ssta models and the lack of library collateral. Another good argument in favor of veritically integrated companies of which there are only 4 left (IBM, Intel, Samsung, ST).


March 31, 2009
In response to: Process variation: you can't ignore statistics any more
Chipwiz commented:

If you are prepared to do Principal Component Analysis to incorporate into the SSTA just like all analog designers have been doing to enable statistical model, then it works. Here again the confidence of the sigma is reliant on the volume of the data. If you are a design house have your operations group setup a business model where you buy die then the yield and performance dilemma will very quickly be a non-issue for you where you focus on performance and the foundry will be forced to focus on yield!


March 31, 2009
In response to: Process variation: you can't ignore statistics any more
StuckInTheMiddle commented:

While Chipwiz makes some good points, I''d also like to point out there are sometimes conflicting goals between the design team and the fab (especially when it''s a foundry). Yes, they both want high yield, but if the design team is at the bleeding edge of performance, the goals can be conflicting since above all, the fab wants to maximize yield. This means the fab will always drive to the widest min/max (+/- 6 sigma) they can get away with and will "play with the numbers" to avoid appearing uncompetitive. This hurts the design team goals of meeting performance AND yield AND schedule AND area. It''s this area where SSTA can add unique value since it allows the design team to analyze the risk/reward tradeoff between the various constraints by doing analysis of the combined probability instead of assuming everything will be worstcase (which is the fab''s viewpoint).


March 31, 2009
In response to: Process variation: you can't ignore statistics any more
Chipwiz commented:

A good broad stroke of the issues and pitfalls to come. A good part of the variability in the 32nm node is systematic while the random component has also increased in the 32nm/28nm node that is still on the SION gate. While the HiKMG process actually improves the random component of the variability in these nodes as reported by Intel. As for SSTA, it is a lot of work with very little return since SSTA would be most useful during initial process bringup where the sigma is large. But during this phase the statistical data behind the SSTA is also immature and minimal (does not make good statistics). Sigma will improve when the statistics is good with volumes of production data, at which point SSTA again becomes of minimal help. Concentrating on the systematic variability would yield much fruit at these nodes. There is a new book from Wiley "Nano-CMOS DFM" that goes into detail on how to harness this systematic variability at these node for better power/perf, area and cost metrics that your readers might be interested. It goes into details of the physics behind the systematics then takes the reader to steps that turn the variability to better PPAC while traversing design-process pitfalls.

POST A COMMENT
Display Name
captcha

Before submitting this form, please type the characters displayed above. Note the letters are case sensitive:

Advertisement
Advertisement
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows