James W Sylivant, Professional Engineer - May 24, 2012
When I was working for a modem supplier in the 1980s, customers in the southeastern United States, particularly Florida, were returning the units after they failed catastrophically. The modems all came back with burned circuitry in an onboard SMPS (switch-mode power supply). There was not much left to analyze. Florida gets a lot of lightning storms, which can cause big problems for electronic devices. With modems, lightning normally causes damage in the telephone-line-interface circuitry, but most of our returns were due to failures of the switch-mode ac-power supplies. The problem usually involved a complete meltdown of the power supply and its components, making analysis of the root problem almost impossible. We performed lots of testing using simulated lightning strikes to the ac-power line, but the testing exposed no apparent problem.
It was time to change the testing strategy and look at lessons learned from previous experience. When issues occur, users are often instructed to turn a device off for 10 seconds or more before powering back on because powering back on too quickly sometimes causes circuits to misbehave. Taking enough time to fully reset to a known state is necessary to ensure a predictable power-on sequence. Lightning can induce surges on the ac-power line. If these surges are large enough, they can destroy electronic devices. However, it is more common for short interruptions of ac power to occur when protective devices in substations of the power company disconnect momentarily, resulting in a corresponding interruption of ac power to customers.
I set up a test by interrupting ac power for short durations to find out whether our power supply had such a problem. Everything was fine for interruptions lasting more than a second. As I continued to test with interruptions of less than a second, I found that, at approximately 750 msec of power-line interruption, our power supply would fail; heat was destroying many components around the main MOSFET’s switch-mode transistor. It was difficult to analyze what had happened after the PCBs and their components had burned or melted. When I set the interruption time to less than 300 msec, however, the problem disappeared. It seemed that our product was susceptible to failures with ac-power-line interruptions of 300 to 750 msec.
With the help of a digital oscilloscope that could collect and record data until a trigger event stopped data collection and one that also could define complex triggering algorithms, we uncovered the root problem. Using multiple voltage and current probes on the SMPS MOSFET, we defined a trigger event for the time during which power dissipation in the transistor exceeded its rating. This task required creating a trigger event based on multiplying the instantaneous source-to-drain voltage by the drain current and calculating average power. By inducing ac-power interruptions shorter than 300 msec and gradually increasing the time of interruptions to 300 to 750 msec until failure occurred, the oscilloscope would trigger and stop data capture, which allowed analysis of conditions that occurred before the failure.
The oscilloscope showed that a circuit associated with ensuring an orderly shutdown after loss of ac power was operating in unexpected ways. The ac-power-line interruptions of the critical time allowed the main switch-mode transistor to enter a linear operating region, which caused enough dissipation to destroy it. The ensuing rush of current caused a general meltdown of surrounding circuitry, including the PCB.
Luckily the fix was simple. We soldered a discrete silicon diode to the back of the PCB at the factory, eliminating any future problems.
Jim Sylivant is a professional engineer in Apex, NC.