EDN logo

Design Features


PowerPC's on-chip-debug features
alter embedded-development landscape

Jim Challenger, Software Development Systems


Although best known for high performance, the PowerPC family adds a new dimension to embedded-system development. Unlike most processor families for desktop applications, the PowerPC family includes features designed around embedded-system developers' needs.

  The PowerPC is one of the most talked about microprocessor families in the embedded-system market. The PowerPC's RISC architecture offers massive processing power at attractive prices. The latest member of the family—the 604e—offers a SPECINT92 of 225 at 166 MHz. More important, the companies behind the PowerPC—IBM (Armonk, NY) and Motorola (Phoenix)—appear to know what it takes to firmly position the PowerPC in the embedded market, and they have the resources to succeed.

  The PowerPC partners have taken care to address embeddability issues on several levels. The PowerPC family offers very competitive performance for its price. The processors are smaller and draw less power than many of their competitors. But, PowerPC chips are more than just faster, smaller, cheaper processors competing for embedded-system sockets. The PowerPC architecture embodies a design philosophy that has a profound impact on development-tool chains and application-engineering processes.

  The more powerful, yet more complex, PowerPC architectures create both new opportunities and problems for embedded-system developers. Caches are significant features of RISC processors, and the PowerPC is no exception. Indeed, caches are vital for achieving published performance figures. Most RISC chips simply don't run well unless the application software recognizes and effectively manages cache memory.

  Cache management can cause major headaches for developers of embedded real-time applications, however. Cache operations make it more difficult to predict code-execution time. Programmers must, therefore, make execution times more predictable by controlling cache usage. For example, you can lock particular time-critical routines, such as interrupt-service routines, into cache to ensure fast and deterministic execution of critical code. Locking certain routines in cache reduces the cache space available for other routines, however, and, thus, can degrade performance, particularly if a routine that formerly could execute from cache is too large to fit into the available space.

  Conversely, if speed of execution is not critical, you can designate code as noncacheable. This approach improves cache availability for critical code, particularly if the application frequently makes calls to the noncritical code—for example, issuing a call at each clock tick.

  The data cache raises development issues, such as coherency, that do not generally concern users of instruction cache. If you can update data outside cache control—in interrupt registers or dual-ported memory, for example—you must ensure that you maintain coherency when you use the data cache. To do this, you must explicitly flush the data cache before you access potentially incoherent data. Alternatively, you can mark the data memory as noncacheable and, thus, ensure that data that can become incoherent never enters the data cache.

Rigid rules can slow system

  If you follow this procedure, however, you may unnecessarily slow application performance. During development, you must verify the operation of your data-cache control algorithm. But logic analyzers and emulators cannot see inside the cache. Without informative feedback, maximizing data-cache performance and coherency becomes a hit-or-miss activity.

  Reflecting their history as paged-memory managers for Unix systems, most RISC processors include memory-management units (MMUs). Embedded-software developers now want to use these facilities, but not for page swapping. After all, most embedded systems have no disks to swap to.

  Often, embedded-system developers want to dynamically upgrade software on site via a modem or other download mechanism. In safety-critical or maximum-uptime systems, this activity may occur while the system is fully active. That is, the system shuts down part of its operation to allow replacing a section of code with a version that implements more functions or fixes a bug. But how do you write to memory to update software and ensure you do not accidentally overwrite running code? The answer in today's embedded systems is to use the MMU to protect the running code as you unprotect the RAM you are updating. Of course, embedded-system developers must develop more complex, processor-specific control code and must develop MMU exception handlers that react appropriately to particular MMU fault conditions.

  Motorola was particularly concerned about how many of the features that give the PowerPC architecture such a performance advantage would affect the development of embedded applications. As a result, Motorola invited Software Development Systems (SDS) (Oak Brook, IL) to work on the PowerPC 505, 821, and 860 RISC controllers at the presilicon stage to simulate cache and MMU operations. Through this collaboration, SDS has gained a detailed understanding of the tricks and traps of the various members of the PowerPC family.

  Embedded-application developers must be concerned with more than just testing algorithms, however. Execution speed is extremely important. And, in a RISC-based architecture such as the PowerPC, memory and cache performance has a major impact.

  When it comes to memory and cache management, host-based instruction simulation is not just a better way to debug and fine-tune an application; it is, in fact, the only feasible way. Caches and MMU simply lie beyond the purview of hardware-based debugging tools, making cache simulation critical for development. But, tool buyers should know that not all simulators support cache simulation. Instruction simulation becomes a required part of the development process for complex processors. Host-based simulation lets engineers test and debug software on the PC or workstation without running the software on target hardware. Thus, simulation enables software development before target hardware even exists. Many advanced development-support features differentiate PowerPC processors from other processors. These features result from architecturewide design standards, many of which apply to embedded applications. Among other things, these standards define what a PowerPC is.

  The PowerPC embedded-application binary interface (EABI) is one of the most important standards pertaining to PowerPC-based embedded-application development. The EABI establishes object-code standards for embedded PowerPC variants. The intention of the EABI standard is to enable compliant code written for one embedded PowerPC to run on all other embedded versions. This capability allows developers to easily reuse code they transport from different PowerPC processors.

  Of course, subtle differences among processors at the supervisory level, mostly in the area of cache management, require fine-tuning to extract optimum performance. The code that implements these variations runs at a supervisory level and varies among family members. In the case of the 600 series, this supervisory code varies from chip to chip.

  The EABI standard offers many other benefits, as well. Compiled program code does not become obsolete when new or upgraded versions of EABI-compliant PowerPC chips come onto the market. Because the EABI standard covers linkable object files, code from different compilers can work together on the same target processor. Vendors can, therefore, supply precompiled libraries without knowing which compiler the developer is using. Users can also debug applications containing heterogeneously developed code. Therefore, EABI-compliant tools, such as SDS's SingleStep debugger, can debug applications that contain EABI-code modules generated by Diab Data (Foster City, CA) and Metaware (Santa Cruz, CA) C and C++ compilers. The reference to C++ is significant.

EABI keeps embedded developers in mind

  The EABI standard is the first embedded-processor-family object-code standard to be launched with designed-in architectural-level support for C++. Even though only a small proportion of developers use C++, there is a steady move toward the use of the language at various levels of embedded-system code. Motorola and IBM recognized this trend and ensured support for C++. Moreover, the PowerPC architecture makes extensive provisions for on-chip debugging. By combining the advantages of hardware-based emulation's low intrusion and minimal memory demands with network-based cross development, on-chip debugging features let developers see what is actually happening in a system.

  Embedded-development-tool vendors are excited about what PowerPC does for development-tool technologies and the tool market. EABI, cache management, and on-chip debugging may appear to be subtle changes, but they will have major effects. Such next-generation approaches make the PowerPC more than just another embedded-RISC socket-stuffer. Users should seek out tool vendors that understand what differentiates the PowerPC embedded architecture. With the PowerPC architecture's new capabilities, tool users should not settle for 1980s-style target-monitor cross development or 1970s-style hardware emulation. Tool technologies that take full advantage of EABI and on-chip debugging can deliver lower development costs, avoid tool-chain obsolescence, and reduce time to market. EABI also provides the foundation for replacing single-vendor tools with open tool chains.

  PowerPC RISC embedded µPs can compete in the price-vs-performance battle with top-of-the-line processors from Intel (Santa Clara, CA), Mips Technologies (Mountain View, CA), SPARC (San Jose, CA), and ARM (Los Gatos, CA). But PowerPC's binary-code standards, caching, and on-chip debugging features give the architecture powerful hidden advantages. You can unlock these advantages only by using the right development tools, however.

On-chip debugging assistance

  The PowerPC architecture offers three flavors of on-chip debugging support, depending on the device family. The PowerPC 600-series on-chip debug interface supports the IEEE-1149.1 boundary-scan standard, which software tools access via a mechanism inside the 600 that interrogates, reads from, and writes to the boundary-scan chain.

  The PowerPC 400 series features an IBM-developed extended IEEE-1149.1 test-access port (TAP). Compared with a generic interface, this TAP gives improved visibility into cache operations. Users can also use this port to set breakpoints and tracepoints in ROM. There are two instruction and two data-access breakpoints.

  Motorola on-chip emulation (OnCE) capabilities are common to both the PowerPC 500- and 800-series chips. These facilities resemble those of extended IEEE-1149.1 but have a very different interface. The 800-series background-debug-mode (BDM) capability offers four watchpoints for instructions, two for data addresses, and two for data values. The device can condition the watchpoints (equal, not equal, greater than, less than) and conditionally combine them (and, or, and so on). For a limited trace facility, the watchpoints can watch, or they can act as breakpoints. In addition, two 16-bit counter/timers are available for use as hardware-break counters. All features work unobtrusively in real time through both the cache and the memory-management unit.

  Besides low intrusion and accurate capture of application operations, on-chip debugging technology reduces the need for expensive emulation tools. You connect the target directly to a host computer on which an on-chip debugger acting as a soft emulator runs as a development tool. PCs and workstations are more graphical, more intelligent, and cheaper than stand-alone emulators.

  Because on-chip debugging capabilities in the PowerPC architecture are designed in accordance with stable standards, users do not have to buy a new emulator every time they migrate to a new chip. Even though emulators struggle to keep up with the rapid pace of chip development, this philosophy keeps chip upgrades from making tools obsolete.

  IBM and Motorola will introduce new versions of their vertical-market-oriented 400- and 800-series chips at a frantic pace. Even such developments as reductions in die sizes and high-density packages will challenge emulator manufacturers and sometimes frustrate their customers. For example, an emulator manufacturer that develops a technique for reliably probing ball-grid-array packages that have several hundred pins must apply the technique across a broad and fast-changing product line.

A PowerPC family portrait
  The family of PowerPC processors for desktops and embedded designs is growing fast. The latest processors to arrive from Motorola are high-integration designs for communications control and personal digital assistants (PDAs). You can identify the various families of PowerPC products through the first digit of the device part number. The 400 series are PowerPC microcontrollers manufactured by IBM. The 500 and 800 series come from Motorola. A 600-series chip could come from either IBM or Motorola and may carry the label of one partner even though manufactured by the other.

600 series: for speed

  The 600 series of chips was originally developed for the desktop and workstations market. You can find these processors in Apple Power Macintoshes, IBM RS/6000 workstations, and IBM PowerPC Windows NT platforms. The 600-series chips still find their way into embedded applications in which performance is more important than price and power consumption. Because two manufacturers make these devices, users can take full advantage of price competition and uninterrupted product deliveries should one supplier become unable to fulfill orders. Because product cost is a crucial differentiator in processor selection, competition among suppliers can result in lower costs.

  Clock speeds for the 600 series range from 50 to 133 MHz with specmark performance rankings starting at 41.7 on integer and 51.0 on floating-point benchmarks. The upcoming 166-MHz 604e processor will reach an estimated 225 on integer and 300 on floating-point benchmarks. In comparison, a Motorola 68040 running at 33 MHz delivers approximately 18 integer and 13 floating-point specmarks.

  Devices from the 600 series have been embedded in high-performance video games and multimedia applications. You can also find 600-series chips in applications formerly dominated by specialized DSP processors. The floating-point prowess of the PowerPC 600 family surprises many DSP users. A 603 offers 85 SPECFP92 at 80 MHz. On a standard 604, this number increases to 165 SPECFP92 at 100 MHz. With its larger cache, the 603e should provide more floating-point performance than simple clock scaling indicates.

  This fast DSP performance stems from the fact that PowerPC chips execute multiply-accumulate operations in one processor cycle, whereas conventional DSP devices may require two cycles. Although DSPs have specialized address-generation circuitry, the PowerPC core runs integer instructions simultaneously to perform the equivalent address calculations and data fetches. This feature makes the PowerPC a real winner in multifunction systems that previously used a DSP and a 68K µP.

400 series: embeddable

  The main members of the 400 series include IBM's 403GA, 403GB, and 403GC RISC microcontrollers. IBM quotes performance for these chips in Dhrystones. IBM has published estimates of approximately 61k Dhrystones for the 403GA processor running at 33 MHz and 39 Dhrystone MIPS for the 403GB running at 25 MHz.

  IBM promotes the 400 series as RISC-based microcontrollers because they lack both on-chip floating-point processors and memory-management units (MMUs), although the 403GC has an MMU for memory protection. All members of the 403 family, however, have instruction and data caches. IBM has scored design wins for the 403GA and GB in portable systems, consumer products, set-top boxes, printers, and other deeply embedded 32-bit applications.

  Significantly, the 403 chips consume around 200 mW. This is considerably less power than the PowerPC 601's 9W or the 603's 1.6W.

500 series: cars and control

  Currently, the only announced member of the PowerPC 500 series is the Motorola MPC505. The company offers the chip in 33- and 40-MHz versions, with performance rated at an estimated 46 Dhrystone MIPS. The chip draws an estimated 700 mW of power.

  The MPC505 includes a floating-point unit (FPU) but not an MMU. In addition, the 505 has an instruction cache but no data cache. This spec is significant on-chip attention to because data caches can introduce performance uncertainties in the control applications the chip targets. The chip does, however, have 4 kbytes of specialization in automotive and industrial computing, Motorola has paid considerable RAM, which you can use for static, local variables, or, perhaps, for the stack. In line with its vertical preparing the 505 for high-temperature and -vibration regimes.

800 series: special uses

  Recently, Motorola introduced the 800 series, a family of highly integrated PowerPC chips. The Motorola 821 and 860 are the first vertical-market installments in this product line. Conceptually, the 800 series represents a RISC version of the popular Motorola CPU32-based microcontrollers.

  Thus far, both members of the 800 family have 4 kbytes of instruction cache and 4 kbytes of data cache, an MMU, and a system-integration unit that includes a glueless memory interface. There is no FPU, although there is a multiply-accumulate controller.

  The 860 PowerQUICC (Quad Integrated Communications Controller) is reminiscent of Motorola's 68360. Another communication-oriented device, the 821, targets handheld and mobile applications, such as cellular phones, personal digital assistants (PDAs), and Global Positioning System (GPS) applications. The 821 integrates an LCD controller with considerable communication-peripheral capability. Future versions of the 800 series may meet the needs of other vertical markets and, perhaps, even individual customers.

Interrupt performance

  Because they were originally designed to run the Unix operating system, RISC processors have acquired a not entirely unjustified bad reputation for interrupt performance. The leading CISC processors have exactly the same problem areas: caches, memory management units (MMUs), and floating-point exceptions. The reason for this situation is that designers of processors intended to run a desktop operating system relegate hardware interrupts to the bottom of the priority heap. That situation exists not just in RISC families, such as the PowerPC, but also in the 68K and x86 series.

  The 600 series' use of superscaler design slightly complicates the PowerPC's situation, however. On 600-series processors, the CPU may be processing as many as five or six instructions when an interrupt comes in. To avoid high interrupt latencies, the PowerPC applies strict rules to instruction processing, however. Although the 604 can issue instructions out of order, it completes the instructions in order to ensure precise interrupt timing.

  To operate this way, the PowerPC has a completion queue that it uses to control when an instruction can store processed data in a register or memory. When an external, asynchronous interrupt arrives, the processor ignores all but the next instruction in the program order. Because the program counter goes to the next instruction in the program order, the processor reissues the subsequent instructions when interrupt service is complete.

  Although this arrangement may seem unusual, it is practically the same process used by the 68K architecture, which simply samples interrupts on instruction boundaries. In effect, the processor allows the instruction in process at the time of the interrupt to complete. If the outstanding instruction is a store, the time required to write the data to memory does not control latency. The data goes only to the cache or to a write buffer. Only a bus error can prevent writing the data. On the PowerPC, a bus error is unrecoverable and is treated as a nonmaskable interrupt (NMI).

Priority to MMU exceptions

  Designers who are accustomed to the 68K's approach to virtual-memory management may balk at making a bus error unrecoverable. However, PowerPC chips handle MMU exceptions separately in hardware. As with later 68Ks, such as the 68040, MMU exceptions take priority over illegal and floating-point instructions, which take priority over external interrupts but not over NMI.

  With a little planning, however, you can avoid MMU exceptions. If you test the software, illegal instruction exceptions should never happen in the field, although they may happen frequently during debugging if you issue a supervisory instruction in user mode. You can avoid floating-point exceptions by accepting IEEE-754 defaults and turning off exceptions in a control register.

  You can avoid most MMU exceptions by mapping all physical memory unless you are actually paging to disk. In that case, you have to accept highly undeterministic interrupt responses. You can generally map all physical memory by using both the table-look-up buffers and the address-mapping registers as you would use the equivalent features of the 68040.

  The biggest problem is dealing with systems that use paging for task protection: You may need to perform a table walk to update the table-look-up buffers because these buffers can hold only so many entries. However, the situation is no different for a 68040 or a 486. If you obey these rules, interrupt response isn't a problem if you use a RISC processor in place of a CISC device.

For best performance, don't…

  Although the PowerPC is a RISC processor, completing some instructions can take several cycles. If you want to guarantee low-latency interrupts, you must make sure not to use these instructions. You also need to be careful with instruction and data alignment. In big-endian mode, only misaligned floating-point loads and stores generate exceptions. Because these values tend to be 32- or 64-bit quantities, misaligning them is difficult, especially if you are using a compiler. Even so, you should avoid misalignment, even though doing so can waste memory; misalignment simply slows the system. In little-endian mode, a 600-series processor complains about anything that is misaligned.

  An obvious instruction to avoid is divide. This instruction is a nightmare on any processor and is best synthesized using software libraries. The PowerPC has some multicycle instructions that appear useful only for communications code: load and store multiple, for example. However, the manual suggests that you use simple instructions because some family members do not guarantee multicycle-instruction performance.

  Finally, the cache-zeroing instruction takes 10 cycles to complete—not that many—but you can easily synthesize the instruction. Motorola's tests on the 603 indicate that, when you use the advice presented here, the average latency on a set of common benchmark applications was approximately eight bus cycles from receipt of an interrupt to the start of interrupt service. The maximum latency was 25 bus cycles. The high number probably results from an instruction's causing a cache miss.


 Author's biography
Jim Challenger is president of Software Development Systems Inc (Oak Brook, IL). He has been with SDS for 13 years and helped to develop the company's Single-Step debugging environment. Challenger holds BS and MS degrees in computer science from Northern Illinois University (DeKalb, IL) and is a member of the Young Entrepreneurs Organization.

| EDN Access | feedback | subscribe to EDN! |

Copyright © 1996 EDN Magazine. EDN is a registered trademark of Reed Properties Inc, used under license.