EDN logo


Columnist: November 23, 1995

Assume nothing.
Test everything.

jack ganssle
Jack Ganssle,
Embedded-Systems
Contributing Editor


I’ve worked with a lot of engineers over the years. Most have a single area of expertise: some design complex high-speed systems, some are firmware wizards, and others are troubleshooting geniuses. A few—the very best—are adept at every area of embedded design. Surely you’ve met that solitary genius who quietly and competently creates a paper design, guides it through prototyping, develops a test or application code for it, and somehow, without fuss, just makes it work.

We need to elevate efficient troubleshooting from its current status as an art to that of a science. Too many engineers, particularly young ones just out of school, are left adrift with no idea where to turn when the damn thing doesn’t work.

The Jedi master engages an opponent by clearing his mind and calling on the Force. Hey—troubleshooting is hard, so call on anything you can! At the very least, follow the Jedi’s example by starting with a clear mind, a clean bench, and an organized set of tools.

Too many designers jump into a problem without getting ready to do battle. You see them with their empty junk food bags piled atop poorly maintained test equipment and with scattered debris from a dozen other troubleshooting contests buried under the latest set of schematics.

Clean up. Get rid of all of those short-producing solder splashes and old resistor leads. Consider mounting stand-offs on pc boards so they don’t lie in the bench debris. Sort out your tools. Make sure you have enough outlets at hand to avoid power-plug mania. Get a pile of clip leads.

Is your lab notebook open and ready for action? What? You don’t use one? Where do you record the things you learn, like modifications needed to the board? Get a bound notebook for your lab work and always keep it at hand. Use it daily. You can record poetry, love notes, or ideas for science fiction stories in it, but make sure you log those engineering details, experimental setups, and your latest neat idea that you plan to try first thing in the morning.

Never just do something—automate it. Build batch files to download your code and initialize the tools. Program the logic analyzer setup and save it on disk. Your employer is paying you to think—automate routine tasks to free up time for those that can’t be automated.

I have a love-hate relationship with the logic analyzer. It’s a fantastic tool that yields information obtainable in no other way. It’s just such a pain to connect 50 or 100 leads to run an experiment. In digital systems, most of the analyzer’s leads will go to the address and data buses. Build a “standard” connector you can attach these to. At my company, we buy extra analyzer pod-ends we can permanently connect to a “standard” internal connector, speeding up the process of connecting the instrument.

Avoid wire-wrapped prototypes. Digital designs are simply too fast now. Rapid-turn pc-board vendors (look at the ads in this magazine) will produce a 10-layer board in a week for a reasonable fee. The pc board will eliminate all of the noise inherent in a wire-wrapped design. As an engineering manager, I’m always terrified by that oh-so-common statement, “Well, this doesn’t really work, but the pc-board layout probably will.” Prove it. Go with pc boards from the outset.


Assumptions

A misspent youth of blaring rock ’n’ roll has left my hearing somewhat impaired, but it did help to formulate (of all things) my philosophy of troubleshooting digital systems. The title of the Firesign Theatre’s “Everything You Know is Wrong” album should be our modern anthem for making progress in the lab.

I hate getting called into a troubleshooting session and finding that the engineer “knows” that x, y, and z are not part of the problem at hand. Everything you know is wrong! Is that 5V supply really 5V at the pc board? What makes you think ground goes to the chips? When a single part has five or 10 ground connections, make sure all of them are connected. Could the system be dead because there’s no clock signal? Are you sure the design isn’t really working? Could your experiment be flawed?

Assume nothing. Test everything. The pc board may have manufacturing errors on internal layers. Power and ground may not be on the pins you expect—particularly on newer high-density SMT parts. Signals labeled without an inversion bar may actually be active low. You might have ROMs mixed up. Perhaps someone loaded the wrong parts on the board.

Never blindly trust your test equipment—know how each instrument works and what its limitations are. If two signals seem impossibly skewed by 15 nsec on the logic analyzer, make sure this is not an artifact of setting it to sample too slowly. When your 100-MHz scope shows a perfectly clean logic level, remember that undetected but virulent strains of 1-nsec glitches can still be running merrily around your circuit.

When you do see a glitch, one that seems impossible given the circuit design, remember that manufacturing shorts can do strange things to signals. Is the part hot? A simple finger test may be a good short indicator.


Learn to estimate

At the peril of sounding like one of the ancients, I do miss the culture of the slide rule. Though accurate answers might have been elusive when using slide rules, we did learn to estimate the answer for every problem before attempting a solution. Alas, it’s a skill that is disappearing.

Calculator abuse—computing without thinking—is now too ingrained in our society to fight it. Bummer. Other instruments also tempt us to coast mentally—to do things without thinking. Take the scope, for example. I can’t count the times an engineer has mentioned that he sees the signal, yet has no idea about the width of the pulse when I ask. Is it 1 nsec? 1 msec? Perhaps a second wide?

Timing is critical in computers, yet too many of us use the scope as a sort of logic probe. “Hey, the signal is there,” we say. Which signal? If you expect a 10-msec pulse every millisecond, then any deviation from that norm is simply wrong. Know what to expect, and then ensure the waveforms are approximately correct. A misused scope will generate a morass of misinformation.

Estimate the performance of firmware before writing it. Sure, it’s tough to know how many microseconds an as-yet-unwritten function will chew up, but you can use your general knowledge of systems to make some ballpark estimates about where problems can occur.

For example, a fast serial link might overrun a busy CPU. Estimate! 38,400 baud is about 4000 characters/second, or one character per 250 msec. 250 msec is not a lot of time for any CPU, particularly the typical embedded 8-bit processor. Your processor will be pretty busy servicing the data. If polled, then only heroic efforts will keep you within the 250-msec timing margin.

Suppose you chose to implement the serial receive routine as an interrupt service routine—what is the overhead? An assembly routine to queue incoming data will need a dozen or two instructions, each of which will no doubt burn up two or three machine cycles. Surely you know roughly how long a machine cycle takes (including wait states) for your system…don’t you? Given this information, you can get a reasonable timing estimate before writing a line of code.


Common sense

Think before you do. Recently I saw a technician troubleshooting a board that exhibited multiple problems. One chip was hot enough to fry eggs, yet he chose to work on another, “unrelated” symptom. Dumb move—surely the part was ready to self-destruct, which certainly would create even more grief for the poor tech.

When starting to debug a very fast system, crank the clock rate down to absurdly low levels. Fix the easy stuff—logic errors and the like—before tackling high-speed timing. Why deal with a vast ocean of troubles simultaneously?

When you do find the problem, and then make a change, sometimes the modification won’t help. Before doing anything, double check the change. Did you solder the wire to the right pin? The right IC? We tend to program ourselves to look for hard problems instead of the all-too-common simple mistakes.

Plan ahead. Don’t try something without knowing what the possible outcomes are…or without having some idea what you’ll do for any of those outcomes. You may find that the next step will be the same regardless of the results of the experiment. In this case, save time and do something else. The best troubleshooters are like grandmasters at the game of chess. They always think many steps ahead of their next move.

Jack Ganssle is president of Softaid, a vendor of emulators and other embedded-systems tools. He can be contacted via Compuserve at "76366,3333," or via Internet at jack@softaid.net. .Send mail c/o Softaid, 8310 Guilford Road, Columbia, MD 21046.

Jack's Home Page



| EDN Access | feedback | subscribe to EDN! |
| design features | out in front | design ideas | columnist | departments | products |


Copyright © 1995 EDN Magazine. EDN is a registered trademark of Reed Properties Inc, used under license.