Feature
Windowed-watchdog timers enhance system security
A supervisory function enables system recovery to prevent execution errors.
By Donald W Corson, EM Microelectronic -- EDN, 7/7/2005
As microprocessor-controlled systems begin to carry out more and more functions involving human safety, the importance of close performance monitoring is increasing. The low cost and range of features for many of today's microprocessor functions allow their use in many applications that were previously the domain of dedicated hardware. Although microprocessors are highly flexible tools, the probability of code errors in their programs lowers their functional reliability. Defensive programming techniques, such as filling unused ROM with halt or illegal instructions to trap illegal jumps in code space, will aid in program debugging. They can also provide a small but useful mechanism for gracious recovery when deployed. But even with the most careful and complete testing, you won't find every error; no method can ensure 100% coverage.
Systems that could cause bodily injury if they malfunction require high reliability. Examples of such systems include automotive antilock-braking or steering systems; medical instruments, such as insulin pumps; robots; industrial-control systems; automatic doors; nuclear-power-plant controls; and avionics. These systems must be able to recover from a crash without human assistance, such as someone pressing a reset button, because such intervention would probably occur too late to prevent injury.
A watchdog timer is a subsystem that can cause a program reset or NMI (nonmaskable interrupt) if a microprocessor does not react within a certain amount of time. In many cases, the timer can catch a misbehaving microprocessor system. For highly sensitive applications, designers should use windowed-watchdog timers, which activate when system code clears them either too slowly or too quickly. Their use adds another class of recognizable program errors or faulty hardware behavior. Ideally, a watchdog-monitored system can restart itself back into a working state without the user even knowing that an error occurred. To achieve this level of comfort, the system and software design must be able to accept a reset at any time and resume normal operation without operator intervention.
Many microcontrollers offer an internal programmable watchdog with similar functions. Software can disable these internal watchdog timers, so they do not provide the same protection for safety-critical applications as do independent external watchdog circuits. Critical applications should employ an external watchdog-reset circuit.
Basic operationStandard watchdogs are incrementing counters that set their output when the counters reach their maximum value. The microcontroller must reset the counter by creating a falling edge on the timer's clear input. If the program execution is faulty because of a program error, or if an external disturbance slows the program execution, the counter will reach its maximum value, and the watchdog-timer output will activate. This approach catches problems such as code executing ("hanging") in endless loops. It does not, however, trigger for errors such as routines that return before their normal-completion cycle, causing the program execution to be faster than expected.
For highest security, a windowed watchdog demands that the timer's clear-input edge be within a certain window. If the signal arrives before or after this timing window, it triggers the output signal to either reset the processor or activate other error handling. This type of watchdog effectively covers programs that execute both too slowly and too quickly.
Not all errors are due to software bugs or conventional processor or circuitry problems. Another cause of error is when the crystals in clock and resonance circuits jump to spurious modes because of external shocks. Although in this situation, the crystal will probably return to its proper frequency after a short time, the processor may be in danger of improper program execution. The windowed watchdog can catch this behavior.
In Figure 1a, the watchdog timing of a windowed-watchdog timer divides into two periods. The time when the falling edge of the
input signal or WDI (watchdog-input) signals an error is called the forbidden window. The allowed window is the time when the
input's falling edge resets the timer and is accepted. (Some documentation refers to the allowed window as the open window and the forbidden window as the closed window.) The time after the allowed window is a timeout. In general, windowed-watchdog products allow programming the watchdog time, TWD, using an external resistor or capacitor.
As an example, for the EM6150/51, the allowed window is ±20% of the TWD. The forbidden window is the time up to 80% of TWD. The watchdog timeout is at TWD+20%. If no
has been received until the end of the allowed window, the watchdog immediately produces a reset pulse. Both a falling flank on
during the forbidden window and a timeout after TWD+20% asserts reset and removes the enable. Note that the timing for the next period starts immediately after the falling edge of
> .
To understand the benefits of using a windowed-watchdog timer over a standard watchdog timer for high-reliability operations, consider the five following events (Figure 1b):
- reset after watchdog timeout;
- reset caused by
arriving too soon, during forbidden window; - timing OK;
- reset caused by
arriving too soon, during forbidden window; and - enable asserted after three good
inputs.
The events represent the following activities: At Event 1, both the standard-watchdog timer and the windowed-watchdog timer generate a reset, because the TWD period passes without a signal on the
input. Next, three correct watchdog cycles cause assertion of the increased-confidence enable output. At Event 2, the
signal arrives too early, during the forbidden window in the windowed-watchdog timer. Therefore, the system immediately asserts the
output and revokes the
output. At Event 3, the timing is OK, but at Event 4, the
signal again falls within the forbidden window of windowed-watchdog timer, causing another reset. In each case, the watchdog timing begins at the falling edge of the last
input. Note that in events 2 and 4, a standard-watchdog timer does not detect that the processor is malfunctioning and working too fast, whereas a windowed-watchdog timer does. Note also that the
output has not yet been asserted. At Event 5, the system again stabilizes itself and asserts the increased-confidence enable
after three good watchdog cycles.
Windowed-watchdog circuits generally also include all of the features of a standard voltage-reset circuit, such as a timeout-reset period or threshold voltage. You can either factory-preprogram these elements or set them using external components, allowing for increased flexibility.
Distributed systems are other applications in which windowed watchdogs help maintain total system confidence. In systems in which a master provides timing or synchronization messages to the slave processors, a standard watchdog can detect a missing master unit, or a master unit that is failing in a slow direction ("failing slow"). A windowed watchdog increases the error coverage to include multiple conflicting masters on the bus, or masters that are failing in the fast direction ("failing fast").
For applications that could cause human injury, such as automatic car windows or doors, it is a good idea to use a windowed watchdog, which is today's state of the art in design. To increase security in applications driving motors or actuators, designers can use an increased-confidence enable-output function for gating motor signals. For instance, this function can immediately stop the motor movement when a system cannot trust the processor behavior and allow it again only when it is confident that the processor is running properly. The watchdog timer reasserts this signal only after it sees three good
edges and removes the signal simultaneously with the
output assertion when it detects a processor malfunction.
Adding a windowed watchdog to a system is an important step in increasing system confidence, but if the watchdog timer's service routine is a timer-triggered interrupt routine just for this watchdog, it is useless. It is very possible that the entire system could crash, yet the timer-triggered interrupt continues to service the watchdog at the appropriate intervals, indicating that all is well.
Always keep in mind the basic rules of embedded programming. Always fill unused program memory with defined patterns and be sure that this pattern is defined for every possible address in memory where a misguided jump could land. The strategy depends on the processor. You can use multibyte or word instructions where a wayward jump could land in the middle of an address boundary.
In general, use halt instructions or known illegal instructions if the processor core traps them, as either instruction traps illegal jumps regardless of the cause. The halt causes a watchdog to trigger, whereas the trapping and processing action that occurs after an illegal instruction depends on the system architecture. Both techniques are useful in a debugging environment to help trace the cause of the illegal jump. In production units, you can use them to set a reset or to trigger a routine that puts the equipment in a known or safe mode.
Never service a watchdog timer using a routine solely for that purpose. The only exception to this rule could be in multitasking systems. Because such systems are often nondeterministic, one option for periodically servicing the watchdog timer is to have a monitor task that services the watchdog, depending on clues that other tasks leave. By incrementing counters when they have finished certain processing functions, for example, the system tasks can leave enough information for a monitoring task to decide whether the system is well. Because this approach uses software to take over a hardware-safety function, designers should make sure the system is sufficiently deterministic so that the watchdog-timer service function uses a working routine.
Also include reset-time processor validation in the embedded-system design. Although processor failures are rare and most often catastrophic, partial failures do occur. Processor validation, which you must do in assembly-language code, should begin with a simple unconditional jump command and then continue to all of the commands that the application uses, where the tested commands can find use later in the tests for other commands. Although programmers may not like creating such test code, it can provide considerable system security and even cost savings, because it allows the system to demonstrate that the processor is thoroughly tested, both in production test (possibly eliminating the need for a dedicated testing station) and in application use.
Don't forget thermal considerationsWindowed-watchdog-timer circuits are also available with one built-in LDO (low-dropout regulator) or more on chip. Such circuits are especially useful in decentralized systems, such as automotive and industrial-automation applications, as they can monitor the security and provide the power-supply regulation in one component (Figure 2).
As with any voltage regulator, the pc-board layout is important to the success of the design. The routing of the decoupling capacitors to the supply and ground traces or planes must be clean and short. Circuitous paths increase the circuit inductance and possibly the cross-coupling between inputs and outputs. Clean separation between the logic supply and the power portion of the circuitry is especially important in circuits controlling electrical motors, due to the large spikes that they produce on the power-supply lines.
Also, designers should take into account thermal issues when planning the layout. The housing of many ICs containing LDOs has a heat-sink contact called a "thermal slug" that you must solder to the pc board. The pc board should provide adequate surface area so it can function as a radiator around the chip. It is best on both board sides to have circuit planes that connect to the slug using thermal vias, to transfer the heat from the chip as efficiently as possible. The actual thermal resistance in any application depends greatly on the physical configuration of the complete module, pc-board cooling surfaces, thickness, airflow, convection, horizontal or vertical orientation, and other factors.
Watchdog components that can recognize that they are being placed in sleep mode, and so adapt their behavior to reduce system power consumption without decreasing security, are also available. These components suit ultralow-power applications using sleep mode, such as those for CAN-bus communication, in which you can disable functional units under software control.
| Author Information |
| Don Corson began his career in computer-peripheral development at Philips in Germany. For the last five years, he has been with Swatch Group and EM Microelectronic. Corson was responsible for the battery system for Swatch's hybrid-car project and works at Swatch Group's semiconductor fab and design company, EM Microelectronic, on battery-related low-power, low-voltage projects. You can reach him at dcorson@emmicroelectronic.com.
|















