Subscribe to EDN
RSS
Reprints/License
Print
Email
PDF Version

The $1 million recall

An engineer’s reputation almost goes up in smoke after a power module comes back from the field heavily burned.

Samuel Kerem, engineer -- EDN, November 11, 2011

The $1 million recall top imageThis story began in 2003 when a 1000W PM (power module), which just a day before was part of huge optical telecom network, came from the field being heavily burned.

The network which transmitted many Gbit/sec was interrupted; the issue immediately hit the radar of the highest management. I delved into a world of telecom power modules, which “must” work forever. To achieve forever, the power distribution is made redundant: though one unit can do the job, two PMs power the same rack in case the other fails and needs replacement, which would be performed by so-called “hot-plugging” while the rack is under power.

Hot-plugging a PM into the 40-60V telecom bus is a tricky business. The current flow to the module is controlled by an onboard MOSFET; to deal with 1 kW, it must be fully on or off. Upon the module insertion, the transient from off to on must be fast but not too fast; otherwise, the inrush current that charges onboard capacitors may brownout the telecom bus. The same MOSFET doubles as a circuit breaker. If an inboard short is suspected, the reaction must be fast, but “nuisance” spikes, common in power environment, must be ignored.

Tales From The Cube Tell us your Tale contest voting image
Congratulations to Samuel Kerem, author of this entry and winner of EDN’s Tales From The Cube: Tell Us Your Tale contest, sponsored by Tektronix. Kerem will receive a Tektronix scope valued at approximately $5000. Read the other finalists’ entries here.
Prior to deployment, the PM passed the insertion test, but the dreadful short test had not been comprehensively tried. It turned out the delayed reaction blows the MOSFET short. In the field, one of the PM capacitors failed short, the MOSFET followed, allowing hundreds of amperes to enter the board. After a few seconds, the hub was filled with smoke and the main circuit breaker tripped, interrupting a substantial part of the network.

The fix was to redesign the hot-swap timing that was set by a few resistors and capacitors. As mundane as the job of calculating their values may sound, it was a vital task. The demonstration of the PM shutting down and restarting in a controllable manner during various induced shorts vindicated the efforts.

The $1 million recall imageThe returned PM.At this moment, the highest management entered the scene. There had been more than 1000 PMs deployed. The fix would cost $10 for parts, and more than $1 million to recall all PMs, modify, and re-deploy. The verdict was to proceed with the fix. I appreciated the trust. The company was approaching the break-even point and each dollar mattered.

Three years later, an innocent e-mail hit me in the stomach: The PM came from a field with possible telemetry failure. The telemetry stopped weeks ago, but the hub was operating flawlessly, so the service visit was delayed. The replaced unit looked innocent when it arrived, but smoked immediately in a test rack. Though the module revision wasn’t immediately known, my gut knew this unit was modified. Backed by the $1 million wager, my reputation was “prohibited” to smoke.

I realized that many people would soon learn the same, and unless a miracle happened, unpleasant calculations would follow. As I could not recall my last encounter with any miracle, I ran into the lab to see the PM. During the dash, I was thinking if it was appropriate to compensate my company for the wasted $1 million. I did hope for leniency but even 90% of forgiveness didn’t feel lovely. The thought of monetary loss sharpened my senses. Upon arrival, I focused my attention on the laboratory power supply connected to the test rack with a visibly smoked module. Now, guided by brain rather than gut, I checked this laboratory supply setting. Eureka! The current limit on this supply was set to 18A; reaching this level would turn this supply into a current source.

Read more Tales from the CubeAfter I stopped worrying about my savings, I started to think coherently. All of the deployed PMs had MOSFET circuit breakers set to 30A (20% margin for 1 kW at 40V). Under the comfortable lab conditions (50V, 20°C) the test rack consistently demonstrated consumption around 800W. Subsequently the lab supply’s limit was set to 18A. I guessed there would be a short on the returned PM, but the external 18A limit didn’t allow the module to recognize the supplied current as a short. Upon power-up the MOSFET was forcefully kept in linear mode till its demise.

In the field for weeks, thanks to redundancy, one PM kept the rack operational, while the shorted PM, with access to unlimited power, reacted happily to 30A-inrush, keeping the MOSFET alive by kicking on-immediately-off every few seconds. The overprotective laboratory settings killed the MOSFET in 20ms. When I returned to my office, I had proof the fix had prevented disaster. I still wonder whether a penny capacitor was the culprit.


Samuel Kerem is an experienced designer of medical, scientific, and telecommunication equipment.
RSS
Reprints/License
Print
Email
PDF Version
Talkback
Canon Resource Center

Featured Company


Most Recent Resources

Advertisement
Related Content

No related content found.

  • 0 rated items found.
Advertisement

KNOWLEDGE CENTER

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Featured Job On
Scroll for More Jobs
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows