Long shots, short shots, and hip shots
Vance Harwood, consultant - May 10, 2012
One morning, at an employer where I once worked, there were long faces on the manufacturing technicians and engineers sitting around the cafeteria table. Several weeks earlier, production of our new automatic-test systems had screeched to a halt when a simple self-test for the driver output’s path resistance started intermittently failing its high limit by a few ohms. Since then, all of the involved driver cards, connectors, cables, and instrumentation tested OK separately, and we verified that the associated measurement code hadn’t changed. The pressure to resume shipments was intense, and we had no clue what the problem was.
Initially, we suspected that it was the fault of the custom zero-insertion-force connectors that tied the driver modules to the outside world. However, a series of experiments ruled out the connectors as the culprits. We needed some new ideas; shooting from the hip wasn’t working. We reviewed the status of the investigation and listed what we knew—which wasn’t much. Yields on the new system had been fine for a couple of weeks, and, suddenly, nothing passed. Obviously, something had changed, but what?
A second search of engineering-change orders produced a clue: The manufacturing rollout of a software/firmware update matched the date when yield had gone to hell. Although there were no self-test-related changes in the code, the coincidence was too much to ignore. We returned a failing system to the previous revision of code, and the test failures vanished. The new systems still required the new software revision, but a simple code change should get things rolling again.
The initial look at the release was discouraging. There were hundreds of edits in the 300,000-line code base. We reverified that none were even remotely associated with the failing self-test routines. Rolling back these changes to see which had caused the failures could tie up software resources for weeks. We looked for something substantive that we could easily swap out. An FPGA-firmware download to the system’s digital pattern sequencer fit the bill. It was a long shot because the sequencer wasn’t running during the failing tests, but we were out of short shots.
To our surprise, this change eliminated the problem. We now had a much shorter list of code deltas to examine. Most of the FPGA updates were sequencer enhancements, but one modification enabled a 50-MHz clock that connected to another subsystem. Perhaps this clock was disrupting the measurement. Sure enough, the failures stopped when we again disabled the clock. One of the engineers noticed that holding the flat cable carrying the clock closer to the metal chassis reduced the incidence of failures—suggesting that RF interference was disrupting our analog measurements.
The failing test forced a known current through the driver-output path and a series-connected test resistor. The high side of the floating measurement ADC connected to one end of the driver path, and the low side connected to a sense circuit that provided a buffered version of the ground voltage near the grounded end of the test resistor. Referencing the ADC’s low side to the ground near the test resistor decoupled most of the ground-return path from the measurement. The test determined the resistance of the unknown driver path plus the known test resistor.
The low-offset op amps buffering the ground-sense voltage had unity-gain bandwidths much lower than 50 MHz. When receiving the new 50-MHz clock signals on their inputs, they were rectifying instead of amplifying—generating an offset voltage that the system interpreted as additional resistance. Adding a lowpass RC filter on the ground-sense buffer input to attenuate the offending signals fixed the problem.
Vance Harwood is a consultant in Loveland, CO.
Share your thoughts.