
High-speed RISC (reduced-instruction-set computing) technology makes it practical to create embedded systems that previously would have been far too expensive and complex. Special-purpose chips (custom ASICs) enable these systems to deliver even greater performance at competitive prices. Although EDA (electronic-design automation) and CASE (computer-aided software engineering) tools make such complex designs possible, an inefficient debug environment can squander all of the time saved at the front end of the design process--particularly if the system under test contains a whole new class of complex debugging problems.
This article will help you to better understand the behavior of RISC-based systems in real-time applications and will show you how to get the most out of the instruments you use for analyzing and debugging those systems. The intended results are shorter and less painful product-development programs.
Most embedded systems handle some type of real-time input/output. The input may come from a disk drive, a robotics positioning sensor, an analog telecommunications link, or any device that is external to the embedded-system core. To function, the system's embedded software must not only be logically correct, it must also perform correctly in the time domain. That is, it must process the data in a specified amount of time. Most real-time embedded systems are also event driven--they are controlled by data they receive from external sources. Embedded systems' time-dependent nature makes debugging them especially difficult because many debugging tools ignore timing issues and provide information only on the software's logic.
Debugging in the time domain
In most cases, software designers initially debug their code on a workstation or a PC. Although this approach eliminates many logic errors, it is ineffective in debugging those features of a program that depend on hardware and time.
Because of the event-driven nature of real-time systems, debugging in the time domain requires monitoring of code as it executes. This usually requires instrumentation such as a logic analyzer or an in-circuit emulator. These tools are especially effective for analysis of time-critical functions. Many of these instruments offer additional tools that allow timing analysis, performance analysis, or time-domain triggers. Time-oriented triggering may include triggering on minimum pulse widths or triggering when multiple counter/ timers overflow, revealing that a routine has not finished executing in a specified time.
A real-time trace buffer that captures aspects of the program's execution history is an effective tool for finding elusive problems in event-driven systems. The deeper the trace buffer, the better. Deep trace lets you look back in time to find the cause of problems. Often, problems don't show up until some time after the event that precipitates them. Because of RISC processors' limited instruction sets, high-level code compiles into more RISC machine-language instructions than CISC (complex-instruction-set computer) instructions. Although the RISC version of a program may execute faster, it usually takes more bus cycles, necessitating a deeper trace buffer.
Whereas many of the techniques discussed here apply to both RISC and CISC designs, developers who use RISC chips generally face more architectural issues. These characteristics include
To understand their impact on debugging, you should examine each of these aspects separately.
Primary and secondary caches and pipelines
Analyzing system behavior with the caches turned on can obscure a debugging tool's view of actual performance. Turning caching off allows logic analyzers and emulators to disassemble correctly and provides a better view of the code being executed--but changes the system characteristics, possibly masking real problems. Debug monitors can run with caches on, but when they hit a breakpoint, the cache is flushed, and its contents are lost--again changing the real operation of the system. You should debug a routine in two passes. In the first pass, turn the cache off and debug the routine's logic. On the second pass, turn the cache back on and ensure that the timing is correct.
One way to verify the timing of a routine that executes from cache is to instrument the code with an instruction that lets you see what is happening. If you insert memory-write instructions into a critical routine that executes from cache, you can set up a logic analyzer to time the event and to monitor the memory location being written to or simply to monitor the progress of the routine's execution. You can enhance this technique by writing various values, such as the address of the routine and the value of a key register, to the specified memory location. This approach allows more flexible triggering, qualification, and display (Fig 1).
Another factor relates to context switching: The cache usually must be flushed and refilled before a new routine can start executing. To speed such switching, some RISC chips allow disabling the cache so that the program runs directly out of main memory. This is a dangerous practice. Although it may speed the context switch, it could reduce the overall speed of the interrupt handler, which now must run directly out of main memory.
Secondary caches add another level of complexity. You must carefully control the two caches to make sure that no data are lost and that each cache remains filled with the correct data (that is, the caches must maintain coherency). A logic analyzer with enough channels can monitor the input to both caches and verify that the correct data are being loaded into each cache at the proper time (Fig 2).
Another difficult debugging chore is verifying burst-mode operations. Burst mode is a fast way to move data from main memory to cache. Rather than issuing an address for each memory location, burst mode issues a starting address and fetches data from sequential memory locations. This reduces the execution overhead because additional addresses need not be sent over the bus. Debugging burst-mode hardware requires an instrument fast enough to acquire the burst data and to reconstruct the entire sequence in either hardware or software. If the instrument is not fast enough, the missing data keeps it from displaying an accurate picture of the program execution. Instruments with inadequate speed accurately display only the first instruction of a burst.
Faster clock rates
Although fast clocks result in fast software execution, they can cause tremendous problems in hardware. Clock rates approaching 50 MHz require designers to consider each signal and every physical path with care. RISC clocks, which have rise times as short as 1 nsec, cause problems that are very difficult to isolate.
High clock rates can result in crosstalk--causing noise on physically close signal lines. When a location is suspect, one approach is to cut the signal lines in question and reroute them outside the area of concern. This work-around may cure the crosstalk, but it introduces other problems, such as uncontrolled impedance and changes in the signal characteristics. Further, improper termination of fast signals can cause reflections that, when large enough, can change the input characteristics of devices along that signal line. In all of these cases, attaching a scope or logic-analyzer probe to the suspect line can change the characteristics of the signal and cause the problem to vanish, resulting in extreme frustration (Fig 3).
To accurately debug problems of these types, you need advanced probing systems. When working with RISC technology, you need oscilloscope and logic-analyzer probes that introduce low capacitive loading. Such probes have minimal impact on signal characteristics and let you track down real problems instead of phantoms caused by your instruments.
Besides clocking at high rates, some RISC processors fetch data at both edges of the clock (two instructions per cycle). This effectively doubles the clock rate at which instruments must acquire data and trigger (Fig 4).
Slower clocks allow enough time for signals to propagate through interconnecting cables and sockets. As signals speed up, a design's margin for propagation delays (the time for a signal to get from one area of a board to another) decreases. In synchronous systems, each device has a setup-and-hold-time specification. As signal speeds increase, setup-and-hold time shrinks, and signal skew (the difference among the arrival times of the same signal--or supposedly simultaneous signals--at different points) requires closer scrutiny.
When a device's setup-and-hold times are "on the edge," metastability is a common problem. Monitoring signals to detect skew-related problems requires a fast oscilloscope or an accurate logic analyzer with resolution of 1 to 2 nsec. Be sure that the channel-to-channel skew specifications of the instrument do not mask the actual skew in the system. Also, look for a logic analyzer that can measure setup-and-hold violations. It should offer low capacitive loading and fast triggering (Fig 5).
Designs with multiple processors
Many embedded systems now incorporate multiple processors in a redundant architecture, a master-slave configuration, or a parallel-processing scheme. Special-purpose µPs like DSPs are usually paired with a fast CPU to perform complex operations such as real-time data acquisition and processing. Although modern RISC and CISC architectures enable multiple processors to work together with relative ease, few debugging tools have kept pace with multiple-processor designs. A modern logic analyzer can be extremely valuable in multiprocessor systems.
Many recently introduced logic analyzers can monitor and time-correlate the activity of multiple processors. The analyzers perform time-correlation of multiple µPs by time-stamping data as the acquisition system gathers it. The more accurate the time stamp, the better the correlation. Look for time stamps with resolutions of 10 nsec or finer (100 MHz or higher).
Logic analyzers with split-screen displays can show the behavior of any two processors at one time. Analyzers with a "link-cursor" feature allow you to scroll one display while the second scrolls in perfect time alignment. Using this feature lets you monitor how one processor passes data to another and verify that redundant systems are actually executing in synchronism. In parallel systems, monitoring processing time helps you to balance the computing load among the processors. A common problem in multiprocessor systems is stale data in caches. Logic analyzers can help pinpoint this problem (Fig 6).
High-level languages and code optimization
Many designers of embedded systems write time-critical routines in assembly language. This approach provides absolute control over what is happening in a system and allows better control over hardware. But RISC µPs are nearly impossible to program in assembly language because of their large number of registers, their limited number of instructions, and their need for code optimization. Of these problems, code optimization is the most difficult to handle.
As a result, development-tool suppliers have put a great deal of effort into the design of optimizing compilers for RISC µPs. Traditional code-optimization techniques include peephole optimization and the elimination of dead and redundant code. Some optimizing compilers speed loop-routine execution by analyzing loop code and moving nonessential code outside the loop. Compiler designers understand such techniques fairly well, but RISC presents unique problems.
To achieve the optimum RISC performance, the compiler rearranges the machine-language instructions' execution order. The idea is to take advantage of the µP's architecture to help achieve the goal of executing one or more instructions per cycle. Without this optimization, performance suffers.
Optimization takes place in two stages. The compiler is run once and the output is evaluated. During the second stage, instructions are reordered to keep the execution unit working full time. This analysis is complex and, in some cases, may be incorrect. Often, a compiler bug occurs only with a specific sequence of instructions; changing the execution order can make the error disappear. Because of the sheer number of possible combinations, it is practically impossible for a compiler vendor to test them all.
When things should work but don't, you can use a logic analyzer to monitor the actual assembly instructions in the vicinity of the problem code. Careful analysis of the execution trace helps to determine if there is indeed a compiler bug. Compiler vendors, once informed of the specifics of a problem, can quickly correct the compiler or formulate a way to work around the bug.
Superscalar processors do not present too many unusual debugging problems, but they can greatly impact the compiler. Just as a compiler must reorder instructions for a deep pipeline, it must do the same for a superscalar processor. The goal for code optimization in both types of architectures is to keep the processor executing one instruction or more per clock cycle while avoiding unnecessary branches or breaks in program flow. These breaks cause pipeline stalls, which slow throughput.
Cost-effective debug with enterprise instrumentation
Enterprise instrumentation requires two fundamental capabilities:
With remote control, a developer at a workstation can set up and run a logic analyzer, display the results, monitor the prototype that is being debugged, and apply stimuli to it. The ability to control the prototype is critical during debugging, as you frequently need to reset the prototype or generate an interrupt. With Tektronix's 92PORT tool, you can stimulate eight lines and monitor eight lines on your prototype via a LAN. It is often difficult to correlate the data acquired from a logic analyzer with what is happening in the software. You have to refer to a link map to translate an address to a location in the high-level program. With a link between the compiler and the logic analyzer, you can view the actual compiler function names in the logic-analyzer display. Tektronix's LA-Connect program provides links between most compilers and Tektronix logic analyzers. Currently two LA-Connect vendors, Concurrent Sciences Inc and Microtec Research, provide suites of high-level debuggers, debug monitors, compilers, and links to other vendors' compilers. These tools coupled with Tektronix logic analyzers, enable you to put together a comprehensive and cost-effective development environment for today's latest and most powerful µPs. These environments can reside on a variety of PCs and Unix workstations. How the link works The keys to an effective hardware/ software debug environment are protocols that link design information to the logic analyzer. Converter programs can extract symbolic information from the compiler's output (object module) and download the data to the logic analyzer. The advantage of linking to the object module rather than to the compiler or debugger itself is that object-module formats are essentially language-independent. IEEE-695, COFF, OMF86, OMF386, and Extended TekHEX are examples of popular object-module formats and load modules used with various µPs and languages such as C, C++, Ada, and Pascal.
|
Software and hardware breakpoints
Debugging real-time embedded systems requires a high-level-language debugger with both software and hardware breakpoints as well as real-time trace. Software breakpoints stop program execution just before the execution of an instruction. These breakpoints are usually implemented by replacing the target instruction with a trap that transfers control to a debugger--usually a debug monitor or an emulator. When execution resumes, the actual instruction is inserted back where it was and is executed. Caches or deep pipelines do not affect software breakpoints, because the breakpoints stop execution when the instruction arrives at the CPU.
Because deep pipelines and caches are common in the RISC world, using software breakpoints is especially effective when debugging code execution. Although a logic analyzer cannot generate a software breakpoint by itself, teaming a logic analyzer with a debug monitor results in a powerful combination. (See box, "Cost-effective debug with enterprise instrumentation.")
Hardware breakpoints, on the other hand, are essential for debugging ROM-based programs or when it is necessary to stop on data reads or writes. Hardware breakpoints require some type of instrumentation--either a logic analyzer or an emulator--to monitor the address and data buses. When the instrument detects the desired event, it generates a trigger, which causes control to pass to the debug monitor. (An emulator stops the processor directly.) Hardware breakpoints usually work well with caches because most caches are set up to be bypassed on a data read or write; thus, data reads and writes usually are seen on the same bus cycle as the actual instruction.
With RISC chips, the deep caches and pipelined architectures cause long delays between instruction fetches and execution. These delays make hardware breakpoints ineffective. Often the cache is flushed before the instruction is even executed. Most logic analyzers and emulators work best with the cache turned off. But this changes the operating characteristics of the system and may mask some time-dependent bugs. Efficient debugging requires a combination of both hardware and software breakpoints.
Problems with large numbers of registers
Large numbers of internal registers increase the time needed to do a context switch. This problem occurs because more registers have to be saved on the stack. Even in architectures, such as SPARC, that have register windows, the state of the processor must eventually be saved. When the original routine resumes, register data are retrieved from the stack, and execution continues. Since context-switching time is critical in many embedded systems, you can be tempted not to save all of the registers, but this approach adds another level of complexity to the verification effort. You must make sure that all of the needed registers are saved and that no routine uses a register that is not saved. In an event-driven system, you cannot always control the flow of events, so careful management of the stack is critical.
Errors often occur as a result of context switching. Although you can easily debug errors in saving and restoring registers, more elusive problems occur with nested routines. Each context switch pushes data onto the stack. When a return is executed, the data are fetched from the stack and the stack pointer is moved. Nested routines cause the stack to grow, and if you have not allocated sufficient space to the stack, the stack can grow into the code or data space and corrupt the program.
Another type of stack problem is improper handling of the stack pointer. An off-by-one error is a common problem that causes the stack to grow gradually until it overflows (a memory leak). This type of problem appears randomly because it depends upon how the program executes. Stack-pointer errors that occur when data are pushed onto the stack do not show up until the data are popped back off.
A logic analyzer is an efficient tool for debugging a stack-overflow problem. Setting a range recognizer to monitor the valid stack space causes the logic analyzer to trigger if the stack grows outside this address range. In addition to detecting the event, a logic analyzer can look at a trace of the context switch and determine what code caused the stack to overflow. Debug monitors that show a trace of subroutine calls leading up to a breakpoint see forward calls only to subroutines that haven't returned (that is, ones that haven't completed execution). In contrast, a logic analyzer shows a complete history of execution (Fig 7).
Hardware and software debugging go hand in hand with the design of RISC-based systems. Understanding RISC technology's inherent problems and the tools available to fix them helps you to get your product to market faster. Effective debugging of RISC-based systems starts with selecting the right instrumentation and the most effective debug strategy.