IC Troubleshooting and failure analysis: Find the facts and avoid the guesswork
Bill Laumeister, Strategic Applications Engineer, Maxim Integrated - October 31, 2012
When troubleshooting a complex device, knowledge is “king.” We want, and need, to know everything relevant to the issue, including the proper IC revision number, where to find relevant reference materials, and who really knows what happened at the customer’s site. Failure analysis of ICs requires a quick and proper response because, of course, helping a customer is our main concern.
But should we expect the quality assurance (QA) department to test every parameter over all conditions during a failure analysis (FA)? No, not at all. Too much of that is guesswork. It may surprise some people, but QA people do not have crystal balls nor do they read minds. Timely and effective IC troubleshooting is only possible when precise technical information about an IC failure is available from the customer.
Failure Analysis of ICs—It Can Waste Time
“Perception is reality,” we have heard this often. When an IC fails or the customer thinks that it failed, we must respond with an FA. Yet, to do that effectively, we must have accurate, pertinent information about the incident. That is the only way to avoid guesswork.
Let me relate an incident that happened not so long ago. A part was returned as a failure and we knew nothing else. We ran it on the automatic test equipment (ATE), bench tested, x-rayed, and decapped the part. We flooded it with soft electrons in an electron microscope to look for emission sites indicating damage. We measured its temperature using a liquid crystal coating. The part was perfect. We found no reason for failure, so the QA department said exactly that in the FA report. Why, we wondered, was the part returned as failed?
About two months later we learned almost by accident that the customer experienced this failure only when the part was heated above +60°C. We started the FA again. We tested the part at room temperature (+25°C), and we found… nothing. The part no longer functioned as it was destroyed in the process of testing it.
Ultimately, this was a one-time return event; it did not happen again. But there was something more important learned in this episode: without crucial performance (i.e. failure) data we were blind and guessing. We wasted considerable time and money for nothing. (See the Sidebar for another more personal story of antique cars, grounding issues, and another failed IC.)
An Exhaustive Exercise in QA Futility
Many times a failed IC is so damaged that the origin of the damage cannot be determined. One customer took a board from the assembly contractor back to their lab facility. There they removed the IC from the board and claimed that the IC failed. Very likely. The customer came to a conclusion: a “root cause” in the IC itself. They wanted an FA, but where was the failure data? Were the circumstances recorded carefully? What would prevent future failures? We were back to guessing, not fact checking—hardly a prescription for a meaningful FA.
In this case the customer had concentrated on three pins of a multioutput device. Here is what we did know: the part left the fab operating with a certainty of a few parts in billions; it operated in a circuit for hours before it failed. Was it an infant failure or was it damaged by external handling? Had it been in the customer’s circuit? In the application environment? Did electro static discharge (ESD) at the factory weaken the circuit so it failed later? Perhaps there was damage by a shipping clerk who ignored an ESD protocol? The list of possible factors seemed endless.
The first partial schematic received from the customer was not very helpful. It showed neither what drove the failed part nor what the part needed to drive. The local FAE was asked to check the ground. Were the grounds separated correctly? You could not tell from the schematic.
We received a few more pieces of the schematic, but now had more questions than answers. Why did the customer check at only three of many outputs? Were any input or output pins of the device connected with low impedances to board pins? Was the power and ground count as low impedance connections? Could ESD on the board pins be the issue? We were still guessing.