A trip through Quality - Part 3

-August 10, 2012

The power of rules of thumb....

Everybody likes “rule of thumbs.” They are simple things that are easy to remember and can help during the design or analysis process. Like: “The Gate Current of a JFET doubles about every 10 degrees C.” Now that's useful to know if you have a high-impedance application that will need to run at high temperatures - it's nice to have a rough idea of what the high-temperature gate current will be.

Some rules of thumb are apt to scaling errors, meaning that they don't scale well with size (I Beam supports in Bridges for example [1]), or are overly pessimistic that cause things to be bigger than necessary, or are just plain dubious at best. And as an example of that, here is the biggest one of all: “Electronics failure rate doubles for every 10 degree rise in temperature.”

Wait - don't flame me yet - I used to think this was true also, but I found out some other information that we may want to consider.

In the mid 1990s I went to a Powercon convention in Long Beach where a talk was given by a Leonard King of Boeing on electronics reliability.

Mr. King started his talk by showing some projections on the cooling power needed to keep all his Boeing avionics running reliably in future years. His point was that it was out of control and he would need tons and tons of cooling capacity on future jets.

Then he described how this rule of thumb came into being and how it was applied. Engineers just assumed that it was true, so they would test semiconductors perhaps 10 or 100 units at 125 degrees C for some number of hours, then they would apply the doubling every ten degrees to get an extrapolated room temperature reliability. Everyone nodded in agreement: “This is how it was done.” [2]

Now this conference was full of end users but also Semiconductor folks from every manufacturer - big and small. So when Mr. King asked the question: “Who has any actual measured proof that this reliability extrapolation is actually true?” - No one raised their hand, not one.

Mr. King then went on to discuss how this thought came to be and he then provided many real world examples of how it could not be true, For instance: A number of aircraft case studies where the avionics cooling systems were broken for significant periods of time, but no increased failures had been noted. He had real-world results.

Now we do know that if we heat our electronics up to the surface temperature of the sun that would be bad and we can expect immediate failures. But the point Mr. King made was how do we know that the scaling factor that was being applied is really true?

He was right of course and that was really the end of that “Rule of thumb” for a very long time.

In the intervening years I had access to some very sharp academic types who studied Operations Research at U.C. Berkeley they showed me how the “Bathtub curve” [3] and most modern reliability concepts came out of the age of steel and were developed for steel production. This is not to say that they are wrong in concept, but that they may not scale the same way for electronics systems as they did for steel production and we may fall into some scaling error or misapplication traps.

In the 1980s and 1990s almost every semiconductor data sheet had some form of life projection curve showing the number of units tested versus hours at some defined temperature. Then the plot would have a straight line drawn (on Log-Log paper) so that the reliability could be easily discerned for any other temperature (or at least that was what they purported to show).

The problem was is that there was only one data point on the curve - the accelerated high temperature point. Everything else was an approximation based on a rule of thumb. Right after Mr. King made the rounds, most semiconductor data sheets stopped showing these plots.

Interestingly enough - some 20 years later these sorts of plots are showing up again and this rule of thumb has been revived and is showing up in the trade literature again. Yet there is still the unanswered question of Mr. King from so many years ago: “Who can show with data that this is actually true?”

Mr. King knew the answer and proved it to be wrong with some real-world data that he analyzed from Aircraft service records and after that talk I too said goodbye to a useful, but wrong rule of thumb.

Interestingly I and others have found that actual HALT testing [4] does continue to provide useful information on every product that I design. And the problems uncovered are oftentimes something different that could not have been extrapolated from previous product experience, but are more a function of unique mechanical construction and exact component placements, etc., rather than a “Rule of thumb.”

So while we all agree with the “Quality Gurus” that patiently explain to us that we can't expect to “Inspect quality in” we can and do continue to design quality into our products with judicious use of time proven HALT/HASS/STRIFE techniques.

[1] There were a number of catastrophic bridge and dam failures that were caused by making an existing design bigger and bigger and bigger until some I-Beam or earth density calculation would fail due to the limit of linear scaling.

[2] This method is based on the application of the Arrhenius equation: http://en.wikipedia.org/wiki/Accelerated_aging

[3] http://en.wikipedia.org/wiki/Bathtub_curve

[4] http://www.edn.com/electronics-blogs/the-practicing-instrumentation-engineer/4390097/A-trip-through-Quality---Part-2

Loading comments...

Write a Comment

To comment please Log In