Long shots, short shots, and hip shots
Vance Harwood, consultant - May 10, 2012
One morning, at an employer where I once worked,
there were long faces on the manufacturing technicians
and engineers sitting around the cafeteria
table. Several weeks earlier, production of our
new automatic-test systems had screeched to a
halt when a simple self-test for the driver output’s
path resistance started intermittently failing its high limit by a
few ohms. Since then, all of the involved driver cards, connectors,
cables, and instrumentation tested OK separately, and we
verified that the associated measurement code hadn’t changed.
The pressure to resume shipments was intense, and we had no
clue what the problem was.Initially, we suspected that it was the fault of the custom zero-insertion-force connectors that tied the driver modules to the outside world. However, a series of experiments ruled out the connectors as the culprits. We needed some new ideas; shooting from the hip wasn’t working. We reviewed the status of the investigation and listed what we knew—which wasn’t much. Yields on the new system had been fine for a couple of weeks, and, suddenly, nothing passed. Obviously, something had changed, but what?
A second search of engineering-change orders produced a clue: The manufacturing rollout of a software/firmware update matched the date when yield had gone to hell. Although there were no self-test-related changes in the code, the coincidence was too much to ignore. We returned a failing system to the previous revision of code, and the test failures vanished. The new systems still required the new software revision, but a simple code change should get things rolling again.
The initial look at the release was
discouraging. There were hundreds of
edits in the 300,000-line code base. We
reverified that none were even remotely associated with the failing self-test routines.
Rolling back these changes to
see which had caused the failures could
tie up software resources for weeks. We
looked for something substantive that
we could easily swap out. An FPGA-firmware
download to the system’s digital
pattern sequencer fit the bill. It was a
long shot because the sequencer wasn’t
running during the failing tests, but we
were out of short shots.To our surprise, this change eliminated the problem. We now had a much shorter list of code deltas to examine. Most of the FPGA updates were sequencer enhancements, but one modification enabled a 50-MHz clock that connected to another subsystem. Perhaps this clock was disrupting the measurement. Sure enough, the failures stopped when we again disabled the clock. One of the engineers noticed that holding the flat cable carrying the clock closer to the metal chassis reduced the incidence of failures—suggesting that RF interference was disrupting our analog measurements.
The low-offset op amps buffering the ground-sense voltage had unity-gain bandwidths much lower than 50 MHz. When receiving the new 50-MHz clock signals on their inputs, they were rectifying instead of amplifying—generating an offset voltage that the system interpreted as additional resistance. Adding a lowpass RC filter on the ground-sense buffer input to attenuate the offending signals fixed the problem.
Vance Harwood is a consultant in Loveland, CO.
MOST RECENT
Wise words from Einstein, Tesla, Spock, and others
The case of the resistor turned strain gauge
Accidental engineering: 10 mistakes turned into innovation
What's your battery shelf-life experience?
10 tips for a successful engineering resume
6 famous people you may not know are engineers
Gears are discovered on the Antikythera mechanism, May 17, 1902
