EDN Access

PLEASE NOTE:
FIGURES WILL LINK
TO A PDF FILE


June 18, 1998


Debugging embedded systems: 10 common hardware problems and how to solve them

Stuart R Ball

Some of the most common hardware problems can derail your debugging effort if you don't watch out. Here are some tricks to keep your debugging on track.

Some of the most devilish debugging problems are the result of seemingly little things--floating pins, wrong values for pullup resistors, and so forth. These are the kinds of things that are easiest to overlook, both at design time and during debugging, so they're often some of the last things you consider when trying to get to the bottom of a problem. Don't overlook them. Make a list. Check for each problem systematically, and you'll save yourself a lot of grief. Here's a list you can start with.

Floating pins

Many years ago, an engineer colleague came looking for suggestions about a problem he was having. He had a µP circuit with a UV-erasable EPROM, and the circuit would work only when he opened the cover of the box it was installed in, or if he put a flashlight in the box with the cover closed. It turned out that the Vpp pin (which receives the programming voltage during programming) was floating. Apparently, the chip needed just a little light (through the erasure window, which wasn't covered) to bias everything enough to make it work.

In the days when everything was TTL, a floating input would show up on a scope as about 1 to 2V. Now that nearly everything is CMOS, floating inputs usually look like ground. Often, if you run your fingers over a board, circuit operation will change. Also, if an IC, such as an 8-bit register, fails only with certain data patterns, look for a missing ground. Many CMOS parts will work without a ground connection as long as one of the inputs is low, but as soon as they all go high, everything stops.

It's often safe to leave unused µP pins floating, but it's better to pull them to an inactive state. A µC with internal pullups, of course, usually needs no other termination.

Risetime problems

\TEXT\IMAGES\EDN\ART95\13DESP41Pullups can be problems in other ways, too. I worked on a problem once with a circuit (Figure 1) that failed on power-up reset intermittently, and then only on some production boards. Apparently, to save power, the designer had used large-value (>100-kohm)pullups. Tristate buffers prevented inadvertent writes to the battery-backed RAM chips during the unstable interval while power was coming up. The problem was that the 74AC08 inputs on some of the boards saw the reset go inactive before the tristate buffer did. The result was that the processor would come out of reset before the RAM circuit was ready, so it couldn't access the RAM. Instant crash.

In another case, a designer had used too large a pullup on a 68000-family part, making a signal's risetime longer than the µP's specification allowed. The circuit worked in production for several months, and then the purchasing department bought a different brand of processor that was less forgiving of the input-transition time.

Peripheral timing problems

\TEXT\IMAGES\EDN\ART95\13DESP42The timing cycle for a µP accessing a periphe  ral IC can always be problematic. In the example of Figure 2, which shows the typical timing for an Intel-style processor communicating with some generic peripheral IC, any of six timing parameters (T1 through T6) can cause problems.

In the write cycle, the processor asserts the address, then asserts the WR signal, then presents data to the peripheral. Time T1 on the diagram is the address-setup time before WR goes low. If the design doesn't satisfy this parameter, the write data could go to the wrong register (or memory location) inside the peripheral. A similar situation exists for time T2, the data hold time after WR's rising edge. If the design doesn't satisfy T2, the peripheral might store the wrong data. The last parameter, T3, is the minimum length of the WR pulse itself. Some peripherals also have a maximum-length parameter.

In the read cycle of Figure 2's example, time T4 is the address-setup time before RD's falling edge. Satisfying this parameter is usually less critical than satisfying the write cycle's equivalent parameter unless the peripheral latches the address on RD's falling edge. Time T5, the time the data must be stable before RD's rising edge, is effectively the peripheral IC's access time. If you don't satisfy T5, the processor might read the wrong data. Time T6 is the data hold time after read cycle completion. This parameter is most likely to be a problem on a processor with a multiplexed address/data bus, where a peripheral that doesn't release the bus quickly enough can cause bus contention on the next cycle.

These parameters are typical of processor and peripheral data sheets, but there are others, of course. Some peripherals have a parameter for the minimum time between successive accesses, or they require synchronization of input signals to a clock. Sometimes they want write data to be stable before WR's leading edge, which requires additional logic with Intel-type processors. Other processors, such as the Motorola 68000 family, have different cycle and signal structures, but the same types of timing requirements apply.

Many designers just connect peripherals together, assuming that if the clock rates or the access times are right, everything else will work, too. This approach can be dangerous, especially if production will run for many months or years, giving plenty of opportunity for installation of parts from different manufacturing lots. It's best to find a timing problem when you start a design, because fixing one can add a significant amount of logic to a board. Verify at the outset that your design meets all timing parameters.

Risetime problems, timing problems, and floating-pin problems are often temperature sensitive, because parts' thresholds and speeds shift slightly with temperature. If you have an intermittent problem that you suspect is caused by one of these conditions, you can often make it show up by using circuit-cooling spray to cool a board or a hair dryer to heat it. Be careful not to get IC packages so cold that they crack or so hot that they melt.

EMI problems

Embedded systems often must control stepper motors, dc motors, or relays, all of which can cause electromagnetic interference (EMI) problems. Any inductive device will cause EMI when it switches on or off. Whether the EMI causes problems is another matter.

\TEXT\IMAGES\EDN\ART95\13DESP43One easy-to-prevent problem appears in Figure 3a, in which a µC drives a relay through a port pin and a MOSFET transistor. When the port pin goes low, the MOSFET will turn off, opening the relay. Note, though, that there's no protection diode between the transistor drain and the supply. When the relay opens, the energy stored in the relay has to go somewhere; the result is a massive voltage spike on the transistor's drain. Depending on the characteristics of the relay coil and the transistor, this flyback voltage can approach 100V--enough to destroy the transistor.

A solution to this problem, shown in Figure 3b, is the addition of a snubber diode across the relay coil. The transistor drain is now clamped to a 1-diode voltage drop above the positive supply. For faster opening of the relay, you can use a transient-suppresser diode instead, allowing the drain voltage to rise to a voltage somewhere between the supply voltage and "total destruction." If you use a transient suppresser, remember that the drain voltage will rise to the sum of the supply voltage and the transient-suppresser clamp voltage.

The catch to this solution, which designers often overlook, is that this fix isn't really free. Adding a diode protects the transistor, but the coil energy still has to go somewhere, and it does. It takes the form of a current spike into the positive supply. If the supply lacks proper bypassing, the result can be a voltage spike on the supply itself. So, when driving relays (or dc motors or solenoids), take a little extra care to be sure that the supply has adequate bypassing and that the path between the relay and the supply has a low impedance.

\TEXT\IMAGES\EDN\ART95\13DESP44Figure 4a, which shows a µP-based board driving a motor, illustrates another current-related  problem. When the motor turns on, current increases, and the increased current passes through the ground wires back to the power supply and to chassis. The current causes a voltage drop in the wiring, because the wiring impedance (inductance plus resistance) is never zero. If the wiring inductance is high enough, the voltage drop can upset the processor board's ground enough to affect the board's operation or to corrupt communication with other boards in the system. Stepper motors or dc motors that are PWM (pulse-width-modulation) controlled present a particular problem, because a high-frequency surge usually occurs when the current turns on.

The circuit of Figure 4b minimizes this problem by adding a third ground wire, which isn't connected to logic ground, for returning motor current to the power supply. The motor still causes a current surge, but it doesn't affect the logic ground. However, although this addition solves the EMI problem, it can cause other problems. An H-bridge usually drives the motor, and if the motor return and logic ground get too far apart, the voltage differential and resulting current can damage the H-bridge.

Ground loops

In the classic case of a ground loop, two circuits connect to different grounds and to each other, and the grounds have slightly different ac or dc potentials. Because the impedance between the two grounds is very low, significant current can flow in the grounds themselves.

\TEXT\IMAGES\EDN\ART95\13DESP45Figure 5, which shows an embedded-µP system communicating with a host PC, illustrates a ground-loop problem. Both systems get power from a 115V ac line. If the two systems connect to different branches of an ac circuit (for example, if they're in different rooms or different buildings), then significant current can flow in the ground. The current flows through the ground wires in the interface connections.

A ground-loop problem can be particularly bad if two connected systems operate on different ac voltages. A typical situation involves a µP system that's part of a large machine requiring 208V 3-phase power. To make matters worse, other heavy equipment, such as air conditioning, might share the 3-phase power. I've seen RS-422 drivers literally destroyed when an embedded system's ground got yanked around by air conditioning compressors turning on and off.

If the interface between a host PC and an embedded system is RS-232C or serial RS-422, you can sometimes solve a ground-loop problem by running the interface through an optical isolator pair. If the interface is parallel, a LAN, or some other high-speed interface, it might be necessary to ensure that the two systems are on the same branch of an ac line. If the voltages are different, you might have to ensure that both systems have clean, independent returns to your building's ground, with no heavy-duty equipment sharing the ground return.

Ground-loop problems can also occur within an embedded system that has many boards and modules, each with a separate power supply. Sometimes you can fix these problems with a ferrite bead on the right cable, but that tends not to be a very permanent or repeatable fix.

Low-level signals

\TEXT\IMAGES\EDN\ART95\13DESP46Ground loops can be a problem even without directly affect your processor; they can affect the devices the processor connects to. Figure 6a, which shows a processor board using a thermistor to read temperature, provides an example. The thermistor has a fairly low output level--say, 1 mV per degree. The logic in the thermistor's vicinity on a board draws current, which causes voltage drops across the power supply wiring and all the connectors. The normal voltage drop typical of such systems isn't enough to upset the logic, but it can be enough to cause an offset in the thermistor reading. Worse, the value may change as the dc current changes with the state of the logic. The solution to this problem is to give the thermistor a separate return (Figure 6b), so that it's not affected by the offset voltage. Of course, the same principle applies to strain gauges, pressure transducers, or any other low-level analog input device.

Shorted outputs

Another source of EMI problems is shorted outputs. It has been my experience that having two CMOS or TTL outputs shorted together can make an entire circuit susceptible to noise. And, of course, the shorted outputs themselves dump a lot of noise into the grounds.

Self-generated ESD

ESD (electrostatic discharge) will often upset a µP-based circuit, because an ESD pulse contains very high frequencies that couple very readily into logic. Even if your equipment is resistant to ESD from external sources, electromechanical equipment can generate ESD internally. If your system uses rotating motors and has bizarre failures, look for ESD, especially if a motor couples to a pulley with a belt. A motor driving a pulley with a belt made of insulating material can be a good generator of ESD, as can any two insulators rubbing against each other--for example, a plastic brake that prevents coasting on a plastic drum.

The usual solution to such ESD problems is to use belts and pulleys that are slightly conductive. If this isn't possible, you may have to use a conductive brush to carry the charge to ground, or you may need to look at alternative drive mechanisms.

The problem goes away

It happens far too often: A subtle bug makes you tear your hair out, but when you hook up a logic analyzer or a scope to look at it, it goes away. When this happens, look for timing errors or race conditions. Usually, the test equipment is adding a few picofarads of capacitance, enough to slow down the risetime of some signal.

Race conditions

\TEXT\IMAGES\EDN\ART95\13DESP47Race conditions frequently occur in embedded systems, as Figure 7 illustrates. In the figure, a µC drives a 74AC139 to generate pulses for some external system. Use of the 139 allows the controller to generate nine separate outputs using only two port lines. The outputs could have various purposes--for example, generating interrupts to other boards or clocking data into registers.

As the timing diagram of Figure 7 shows, problems arise when the µC steps through the select lines. As each input line changes state, a momentary glitch appears at one or more outputs. (The diagram shows glitches on the Y1 and Y3 outputs, but the glitch locations can vary in devices from one manufacturer to another according to the devices' internal structures.) If the outputs drive registers or latches that are fast enough to respond to the glitches, invalid data can result. The solution to this particular problem is to use a third port pin, connected to the AC139's enable input, to gate the outputs off when the select inputs are changing.


References

  1. Ball, Stuart R, "Debugging embedded systems: using a trace buffer to see what went wrong," EDN, April 9, 1998, pg 161.
  2. Ball, Stuart R, "Debugging embedded systems: using hardware tricks to trace program flow," EDN, April 23, 1998, pg 163.
  3. Ball, Stuart R, "Debugging embedded systems: using a serial condition monitor to overcome limited diagnostic access," EDN, June 4, 1998, pg 167.

This article, one of an occasional series on basic debugging techniques, is an adaptation from the book, Debugging Embedded Microprocessor Systems by Stuart R Ball. Material reproduced courtesy of Newnes, an imprint of Butterworth-Heinemann, 225 Wildwood Ave, Woburn, MA 01801-2041. For more information, check www.bh.com. To order, call 1-800-366-2665.


Author's biography

Stuart Ball, PE, is an electrical engineer who has spent the last 16 years designing digital, analog, and embedded-µP systems. He is the author of two books, Embedded Microprocessor Systems, Real World Design and Debugging Embedded Microprocessor Systems, both published by Newnes (Woburn, MA). He is currently employed at Organon-Teknika (Oklahoma City, OK), a manufacturer of medical instruments.


| EDN Access | Feedback | Table of Contents |


Copyright © 1998 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Business Information, a unit of Reed Elsevier Inc.