A trip through Quality - Part 1

-June 19, 2012

When I first started out in Electronics I worked for a small company making Modular Instrumentation Products. A job came by that wanted a "MIL-STD 217 Calculated MTBF" (Mean Time Between Failure). I was handed a copy of the standard and told to make a report for my design. MIL-STD 217 had a short cut called the "Parts Count" method and as the name suggested it was based on counting the number of parts in a design and adding the failure rate up based on the type of part: Transistor, IC, resistor, etc. - and the derating applied to each part.

That's what the customer wanted and that's what we gave them and I typed up a nice three page report. The customer came back and said that the MTBF was too low for their system, it had to be some higher number. I asked the more experienced engineers and they said: "We have these in house tested parts that are tested to MIL-STD 883, they have a higher reliability for the calculations". These parts were subjected to burn in, thermal cycles and humidity soak with testing afterwords. This allowed us to assign a higher reliability to the parts and when cranked back into the MIL STD 217 the product had a much higher calculated MTBF. We were happy and the customer was happy, but it left me wondering: "How real is all this?"

After the breakup of the Bell System there were a slew of new telephone equipment companies started to take a share of the captive business that Western Electric had. These companies all took with them the Bell Core Reliability Standard. This was an updated version of the old MIL STD 217 based on Bell Labs experience with reliability and it too had a "Parts Count Method".

At that time I then worked for a Merchant Power Supply company and had more than a few years real experience in what worked and what didn't when manufacturing electronic products. I also had the opportunity to see every single failed part returned at that company. We had really excellent quality and there weren't that many returns, but being able to do a failure analysis on every single one was a big eye opener for me and a great learning experience.

So with these new telephone companies came new projects that specified a: "Bell Core MTBF based on the Parts Count Method."

With the experience I now had, I realized that these parts count methods had a place in the reliability "Took Kit" but they had their limitations also.

Let me explain some. In the high quality power supplies we were building, probably 30% of the parts were directly related to protection. We put transient suppressors on the inputs and outputs, we had thermal shutoffs and short circuit protection on almost all of our power supplies and we doubled up on capacitors sometimes to ensure enough derating of the ripple current.

Yet if I left these parts out of the design the "Parts Count" based reliability would have been very much higher, but would this really be a more reliable power supply in a system?

No it wouldn't, so I came to realize a big problem with these methods. Some customers also realized this and suggested that the MTBF should be made without the inclusion of those protection parts that reduced the calculated MTBF. This wasn't a great solution, because the whole calculation method would then be very subjective and subject to much "Tinkering and Specsmanship."

These parts count reliability methods did indeed lead to all sorts of interesting marketing Specsmanship on data sheets. Various companies would publish MTBFs of upwards of 100 years or more. This was obviously silly as the the fact that the Optocouplers used in these power supplies have 5% to 15% or more Current Transfer Ratio Degradation per 4000 hours[1]. So what would be the CTR at 100 years? Probably well below the loop gain variation that the designer was planning on and that's just one part - how about the capacitors? The solder joints? The plastics used in the IC packages? The bonding wires? The PCB? The transformers? What's the permeability of that ferrite used in the inductors and transformers in 100 years?[2] All sorts of unanswered, but real questions.

For one example: I had some very practical knowledge of Ferrite Transformers and what could go wrong, because when you make 1,000 to 10,000 of something in a month you get to see lots of variation and process problems. The Bell Core standard had a very low failure rate on transformers and at first glance that might seem correct, after all what can go wrong with a transformer?

Yet I have seen firsthand that many Ferrite transformers used in high-frequency power supplies are glued together. This glue may hold for a while, but upon thermal cycling the glues can be made to fail. I know we caught many of these potential failures by just thermal cycling and testing our new manufacturing methods and power supply designs. Thank goodness we did this testing before shipping our products so we could catch the things that looked OK, but would have proved otherwise long term. This gluing proved to be a critical process step that could make or break the final products overall reliability, yet this was not reflected in the Bell Core parts count methods popular at the time.

So I found that these parts count methods are useful tools when the same assumptions are applied to all parts of a system calculation and used for relative comparisons of possible different system configurations. But to base a desired or goal MTBF based solely on these methods probably is not such a good idea as was shown with the examples above - sometimes the "Assumed" reliability metric doesn't actually match reality or keep up with changes in designs and processing methods.

We will continue next time (see Part 2) with some ways that have actually proven useful in anticipating, understanding and preventing failures and some more that have not.

[1] Vishay Intertechnology, "AGING OF INFRARED EMITTER COMPONENTS," see also: "Optoelectronics/Fiber-Optics Applications Manual," Hewlett Packard Company, 1981

[2] The decrease in electrical properties of materials is well known and in Ferrites it is called: Disaccommodation. Even high quality ferrite magnetics can loose over 3% ormore of their permeability in just 20 years. See: Snelling, "Ferrites For Inductors and Transformers," Research Studies Press 1983. Ceramic Capacitors also suffer from these same sorts of aging effects.

Loading comments...

Write a Comment

To comment please Log In