
In conversations with engineers, I've discovered a troubling pattern: More and more often, troubleshooting seems to be relegated to a handful of old-timers. Are we seeing the beginning of the end of this critical skill? How many of us, having mastered troubleshooting or any other art, write about it? Do we even make rough notes about a cool trick or clever solution to a problem?
The fact is: virtually no one does, so each generation of designers must reinvent the same skillset. Seems pretty silly, doesn't it? Ideally, we as an industry will someday develop a handbook of troubleshooting wisdom. In the meantime, the best we can do is pass along our own experiences and collect other sources of knowledge.
For instance, one of the best troubleshooting references is Troubleshooting Analog Circuits by Bob Pease (Butterworth-Heinemann, Boston, 1991). Though aimed squarely at the analog designer, it's still a must-read for us digital folks. I, too, have collected a few tips, mostly through the school of hard knocks.
Next to a skeptical attitude, the most important tool we use is the oscilloscope. Emulators, logic analyzers, and all of those other nifty pieces of capital equipment have their own important roles, but nothing measures up to a scope for 90% of normal troubleshooting.
Scopes glitter from the pages of catalogs, each with their own special features luring us into a frenzy of high-tech lust. If you can afford the best available, by all means scarf up that puppy and enjoy the thrill of 2-GHz full digital acquisition at the touch of a button.
The rest of us will often have to make do with something less extravagant. Though there is no substitute for the correct test equipment, clever use of what you've got may often be all that's required. One consultant I know still uses a vacuum-tube-based 545 with only about 20-MHz bandwidth. I think he's working too hard (spending a few grand on a modern instrument seems like a minimal price of entry to the field), but his troubleshooting skills and deep knowledge of the scope make him quite successful at finding tough problems.
One of the worst mistakes we make is neglecting probes. Crummy probes will turn that wonderful 1-GHz instrument into junk. Managers hate to spend a lot on probes, especially when they see them drooling onto the floor mixed with all of the other debris. Worse, we always immediately lose the tips and other accessories acquired at great expense, so we connect to a node using a 12-in. clip lead hastily purchased at Radio Shack.
Then, after destroying a couple of chips by accidentally shorting things to ground with that nice alligator ground clip mounted on the probe, we tear it off in frustration, losing it as well. Tip: if you really don't intend to use the ground connection, clip that alligator lead to itself, keeping it out of harm's way, but instantly available for use.
Take care of your probes. Keep them off the floor; don't let your chair roll over the leads. Buy decent probes before every one in the shop falls apart. After trying all of the cheap varieties, I now swallow hard and spend the $150 needed to get high-quality probes from Tektronix or HP.
Here's another tip: when using a scope, if a signal looks weird, maybe there's something wrong! Avoid the temptation to rationalize the problem. Instead of blaming the signal on a lousy ground, quickly connect that ground clip and test your assumption. Never accept something that looks awful. Convince yourself that either it's actually OK or find the source of the problem.
Walk through your lab. You'll find most of the digital folks have their vertical amplifiers set to 2V/division, which eases displaying two traces simultaneously. Unfortunately, too many of us seem to think the vertical gain knob is welded into position. It's hard to distinguish a valid zero from one drooling just a little too high with so little resolution! Flip to 1V/division occasionally to make sure that zero is legitimate.
Every instrument is a lying beast, a source of both information and disinformation. The scope is no exception. A 100-MHz scope will show even a perfect 50-MHz clock as a sine wave, rather than in its true square form. Digital scopes that exhibit aliasingsweeping too slowly (below the Nyquist limit) for a given signalmay make that 50-MHz clock look like a perfect 1-kHz signal. This situation will cause the inexperienced engineer to go crazy searching for a problem that just does not exist. You have to know your tools to use them effectively.
We digital folks deal in ones and zeroes... and tristates. Each condition means something. When troubleshooting, you've got to know which of these three (not two!) states a node is in. Our best tool is the scope, yet it is inherently incapable of distinguishing the tristate condition.
In the good old days of low-power Schottky technology, you could be pretty sure a tristated signal would show up at around 1.5Vsomewhere between a zero and a one. With CMOS, this assurance is gone, yet most engineers blithely continue to assume that 0V means zero. It just ain't so!
My solution is a little tool I made: a 1k resistor with a clip lead on each end. It's nicely soldered together and covered with insulation to avoid shorts. To tell the difference between a legal state and high impedance, clip the tool to the node and alternately touch the other end to VCC and then ground. If the node moves more than a trifle, something is wrong. The scope, plus my tool, lets me identify all three possible states. Without the tool I'm guessing, and guessing while troubleshooting always sends you down time- consuming blind alleys.
You can use a variation of this approach when troubleshooting an intermittent problem. If the silly thing refuses to fail when you're working on ita sure bet, given the perversity of naturerun your fingers over the board's pins. A purely digital board should continue to run despite the slight impedance changes brought about by your fingers, yet these may be enough to drive a floating pin to the other state, hopefully creating the failure you are looking for.
On SMT boards, it's tough to get at a device's pins. If you're suspicious of a pin, touch it with an X-Acto knife. The blade will precisely align with any tiny pin, and its metal handle will conduct your body impedance to the node. Sometimes I'll connect my pullup/ pulldown clip lead to the knife itself to exercise the node more deterministically.
The most effective troubleshooting tool is a keen eye. When you're looking at a finished product, don't disregard poor manufacturing. How many of us have spent hours troubleshooting a board, only to find a missing chip? Perhaps the wrong part is installed or the correct one is upside down.
In smaller companies, engineering is often production's backup for troubleshooting. Don't accept boards unless a technician has performed a careful visual inspection first. Then, inspect it yourself. It's much quicker finding manufacturing defects by eye then by performing component-level diagnosis. Look for those missing and backwards chips. Check soldering and solder splashes.
Inspect soldering on through-hole boards using a not-too-sharp pointer, like an awl. Move it along every pin, using it as a guide for your eye (which will otherwise quickly tire looking at a sea of pins). Scan the board one chip at a time, working in a logical progression from one side of the board to the other. Look for unsoldered and poorly soldered pins, as well as solder splashes. If it looks bad, it is.
Despite modern quality-control processes, pc-board defects are the most frustrating problemand far too common. Keep the pc-board artwork around as a reference, so you can see where the tracks run when it's time to fix a short or a design problem.
Often a new design suffers from a problem you just know you can cure by grounding a signal. Be wary of using a clip lead as a grounder: High-speed signals will see the lead's inductance as a high impedance. The ground end will be at ground, for sure. The signal end may not look much different than without the clip lead attached. Edges are so fast now, even in slow systems, that wires no longer act like wires. Solder a short (very short) run to ground, perhaps using a discarded resistor lead. I have found that grounding via a clip lead now only works on dc signals. Sigh... in the good old days of slow systems, a mountain of clip leads were a troubleshooter's best weapons. Now look warily at that mound, realizing that a wire is not a wire.
Use all of your tools. One of our scopes has a neat digital counter. We use it for tough hardware/software troubleshooting problems. Unsure if an interrupt comes as often as it should? The counter will tell you without a doubt how many come along. Wondering if all interrupts get serviced? Put one counter on the interrupt line and another on the acknowledge, and see that the values are identical.
Computer systems will crash and burn from a single event. Though digital scopes are wonderful at capturing single-shot signals, it's usually much easier to work with a problem that repeats itself, often, so you can run tests at will. A logic analyzer excels at finding these one-time problems, but most won't help much with electrical (say, marginal signal levels) issues. Always be on the lookout for ways to cause these events to repeat. For example, the easiest way to troubleshoot reset problems is to use a pulse generator to reset a dead CPU repeatedly, so you can scope the reset sequence.
Years ago we used a shortwave radio to listen to the operation of our systems' code. With a little experience, we knew what sort of noise to expect in each of the instrument's important operating modes. With the volume turned to a quiet murmur, any change in its buzz instantly signaled trouble. Troubleshooting is a multisensory experience. Wait! What's that? It smells like a resistor burning... a wire-wound, by its odor... The game's afoot!
Visit Jack Ganssle's Web page at http://www.softaid.net/emulators.html.