|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
March 14, 1997 Switching RISC architectures the easy wayJohn Canosa, Questra Consulting, SES Technology R&D Group Performance demands and discontinued product families can force you to switch RISC processors for your next-generation product. Choosing the right replacement can ease the pain of the resulting hardware and software redesign. The never-ending demand for more features and better performance is forcing designers to re-evaluate the microprocessors that their products incorporate. Outside forces can also affect processor selection for a next-generation product; AMD's (Sunnyvale, CA) decision to end development of the 29K family could leave many users without a migration path to the next performance level. To ease the inevitable shift to a new architecture, you need to examine and understand the RISC alternatives. In most designs today, software is a large component of the overall system-design effort and cost; the main issue in changing processors is software-related. If, for instance, you are currently using a real-time operating system (RTOS) in your product and are considering changing to a new architecture, you must first make sure that your RTOS vendor supports the processor that you have in mind. RTOS vendors typically support many, but not all, architectures. An architecture switch may also involve changing tools and using a different compiler vendor or debugger. The impact of processor choice on the RTOS and tools could, therefore, narrow your alternatives. Processor vendors can supply you with a list of which RTOSs and tools support which processors. Changing processor architectures can also lower software performance. Most designers understand how things such as cache size and clock speed affect processor performance; however, the underlying architecture has a more significant effect on performance. Understanding the RISC instruction pipeline, what causes stalls, how the processor fills delay slots, and whether the processor uses speculative branching is useful when you are trying to write high-performance code. Such understanding is useful, even if the development team writes all its code in high-level languages. C and Ada are wonderful (although C zealots may not think that Ada is wonderful and vice versa) because they abstract away the complexities of the underlying architecture. The C and Ada languages are especially useful for RISC processors because compilers can automatically include optimizations that keep the instruction pipeline filled. Unfortunately, a high-level language gives some programmers the incorrect idea that they don't need to fully understand the underlying processor architecture. Most designs still have some assembly-language components, although they may be isolated in a board-support package for an RTOS or interrupt-service routine (ISR). And let's face it, if C code does not meet your performance requirements, your next likely step is to write some small parts in assembly language. If it comes down to that, you would have to understand the architecture anyway, so you might as well understand it in the first place. Table 1 summarizes the architectures of several popular RISC-processor families. Several aspects of a processor's architecture merit careful study, including the processor's instruction set and how that set interacts with the registers. Because embedded applications rely heavily on interrupts, you also need to thoroughly understand the processor's exception-handling mechanisms. In fact, interrupt and exception handling is one of the areas in which RISC processors differ most. Understanding how your RTOS interacts with the processor in such areas as parameter passing, context switching, and setting up stack frames is also very important. Keep the pipeline filled To understand the differences among RISC architectures, you must realize that the goal of RISC-software design is to keep the pipeline filled; that is, to avoid pipeline stalls. Any instruction that requires multiple clock cycles to execute can potentially cause a pipeline stall. As an example, memory loads and stores, branches, floating-point operations, and DSP operations, such as multiply/accumulate, can all take several clock cycles to complete. Figure 1 shows what happens during a branch instruction in a well-behaved pipeline. In this case, the processor undergoes a two-cycle latency between the determination of whether to take a branch and the loading of the branch target instruction. Pipelines with different numbers of cycles have different latencies. (This pipeline has a five-cycle pipeline.) These latency cycles are wasted time for the processor--something that you should avoid for optimal performance. Processor architectures reduce latency in two ways. The MIPS, SH3, and 29K architectures use a delayed-branch scheme. With a delayed branch, the processor always executes the instruction that follows the branch decision (Figure 1, Instruction 2). Good optimizing compilers for these architectures try to keep in this delay slot an instruction that does not affect the branch decision's outcome. Other architectures reduce latency by guessing whether the branch will be taken and filling the pipeline accordingly. Both the I960 and the PowerPC 603 use static branch prediction, which means that the compiler encodes the guess into the branch instruction itself. The processor then automatically loads the predicted path into the pipeline. If the guess was correct, there is no pipeline stall or latency. However, if the processor doesn't take the path, the pipeline incurs the branch instruction's full latency. The ARM 710 architecture is unique because you can make all of its instructions conditional. There is no branch prediction or delayed branching. Instead, the ARM 710 uses a three-stage pipeline to keep the branch latency low. Check the effect of interrupts You must consider interrupts when you change RISC architectures, because embedded systems live and breath by interrupts. The real world seldom provides external events when the processor is looking for them. The irony is that you use interrupts to provide rapid response to external events, but the interrupts generate the largest disruptions to a processor's pipeline, which cause the largest latency in handling those interrupts. Interrupts and exceptions disrupt the pipeline in the same way a branch instruction does, but the similarities end there. You cannot predict interrupts, and they place operational constraints on which instructions can occupy a delay slot. An interrupt can cause the execution of a delay-slot instruction after a branch. If the delay-slot instruction happens to be one that an interrupt from an interlock signal was supposed to prevent, the situation can wreak havoc on your design. There are as many ways to handle interrupts as there are processors, but you should be particularly aware of certain behaviors. All processors save the machine state and some version of the program counter to allow normal execution to continue after the processor has handled the interrupt. Some processors, such as the PowerPC 603 and ARM, complete the instruction in the execution units when an asynchronous interrupt occurs, before servicing the interrupt. The MIPS, 29K, SH3, and I960 all abort the current instruction execution. Another important interrupt behavior is what happens to any in-process load or store operations. Typically, the processor completes the load or store before handling the interrupt, but what happens if the operation is a multiple-load or -store operation? Is it canceled, suspended, or run to completion? The data books do not always address these questions, so you should contact the vendor for the answers. One of the main software issues in dealing with interrupts is preserving the processor's previous context. Pushing registers onto the stack can be time-consuming and have a major effect on the interrupt handler's latency. Therefore, RISC processors typically save few registers automatically. To avoid such time-consuming saving of registers to memory, for example, the ARM and SH3 processors swap parts of the general-purpose register banks with some secondary registers during interrupts. Because RISC processors save so few registers, the software developer must be careful when dealing with interrupts, especially nested interrupts. A common mistake when using nested interrupts is to forget to push the previously saved machine-state registers and program counter onto the stack before re-enabling interrupts during an ISR. If a nested interrupt occurs, the processor may overwrite data not pushed onto the stack. Check register usage for conflicts You cannot write efficient, high-performance C code and move it to a new processor architecture without understanding the processor's register set and any parameter-passing and function-return conventions your compiler uses. For example, how would you map the 29K family's 192 global and local registers to the PowerPC's 32 general-purpose registers? The 29K uses the Berkeley RISC architecture, which has large register files and register windows, whereas other architectures use a smaller, single-register file. When exploring RISC-processor registers, you find that many "general-purpose" registers actually have specific uses, either by software convention or by hardware design. For instance, in the MIPS architecture, general register R0 is hardwired to zero, and general register R31 is a link register that contains the return address for jump and link instructions. To avoid problems, the designer should consider some registers in each processor off-limits (Table 2). When writing assembly-language functions, you also need to understand which registers you can change and which registers the calling function expects to remain intact. If your compiler vendor does not document any of its parameter-passing conventions, write some small functions that pass and return different types of parameters. By examining the resulting assembly listings (yes, in these days of source-level debuggers, you can still get assembly listings), you can determine which registers the compiler uses for parameter passing and return values. There are also some standards, such as the Embedded Applications Binary Interface (EABI) for the PowerPC and the Host Interface (HIF) specification for the 29K family, that specify registers for parameters and return values.Table 3 lists EABI-register conventions. In addition to the software issues that arise when you change RISC architectures, significant hardware issues can emerge. Many high-performance embedded applications rely on ASICs to reduce system size and cost. Switching to a MIPS processor with a different bus interface can have severe ramifications if your ASIC was designed for the 29K bus. (The thought of losing hundreds of thousands of dollars in NRE charges for that new image coprocessor is enough to give your accounting department nightmares for a year.) Bus interfaces are not specific to an architecture family, however. Some members of a family have the standard bus interface, and others may have a built-in memory controller. Yet, even devices with memory controllers typically have a mechanism for an external device to signal when it is ready to latch data in or when its output data is valid. (Using this mechanism typically results in better performance than does inserting a worst-case number of wait states into a memory-controller register.) In some cases, therefore, adding a few dollars' worth of PLD to create a bus translator could save your ASIC investment and prevent the need for redesign at least until you find the next bug. Translation hardware preserves designs The first step in any bus-translator design is to gather all the available data on the two processors' signal lines. Knowing how processors handle various signals can help you select a new processor that minimizes design effort. Typically, you also need to place some constraints on the design to make the task more manageable. Consider, for example, the data-bus and instruction widths of the processors in Table 1. The SH3 has a 16-bit instruction width with a 32-bit-wide data bus, which allows the SH3 to load two instructions at a time. Similarly, the 603 has a 32-bit-wide instruction with a 64-bit-wide data bus that you can set to use only 32 bits. Planning your translator design to use only the mode compatible with the original design helps keep the task reasonable (see box, "Switching from the 29K to the PowerPC"). In addition to such physical differences, you should also carefully consider the differing behavior of signals. How a processor handles arbitration, simple and burst read/write transactions, resets, and interrupts can affect the translator design. Processors treat arbitration, for example, in two ways. The I960, the MIPS 3xxx, and the SH3 series can act as bus arbiters; that is, they assume that they have control of the bus and give up the bus only when an external device requests. The SH3 can also act in slave mode, which implies that it must request the bus from an external arbiter. The PowerPC 603 and the AMD 29K processors work in a way similar to slave mode. For instance, when the 29K wants the bus, it asserts the bus-request (~BREQ) signal and waits for the bus-grant (~BGRT) signal to go low. Once ~BGRT goes low, the processor completes the bus transaction while holding ~BREQ low. The PowerPC 603 follows a similar method but uses other signals to determine that the bus is free. Also, the 603 does not hold the bus request low during the entire transaction, thus allowing another arbitration phase to begin while the current transaction is in progress. When the I960 wants to start a bus transaction, it checks its HOLD input pin. If HOLD is not asserted, the processor begins its bus cycle immediately. If HOLD is asserted, an external device has requested the processor to give up the bus. The hold-acknowledge (HOLDA) output signal acknowledges that the processor has released control of the bus. After arbitration occurs the bus transaction. Simple transactions perform a single read or write to an address location. This read or write may be a single-byte, half-word, or full-word access. A generic simple bus transaction has an address phase and a data phase. The address phase consists of the processor's putting the address onto the address bus and then asserting an address-valid signal. The data phase consists of either the processor's or the peripheral's driving the data bus and then asserting a data-ready signal or deasserting a wait signal. Table 4 lists the common bus-related signals for various processor families. All of the buses in this article are synchronous. If you use FPGAs or complex PLDs (CPLDs) in the bus-interface design, take extra care when using the R3xxx processors. The R3xxx family's bus interface uses both edges of the system clock, an action that many FPGAs and CPLDs do not support. IDT (Santa Clara, CA) does offer a work-around to the dual-edge design, but the work-around requires some higher speed devices than does a comparable single-edge design. Burst transactions complicate bus translation Burst transactions can be difficult when you design a bus translator. The data-acknowledge signal for a burst can differ from that of a single-beat cycle, the maximum number of beats in a burst may vary, the address may or may not be automatically incremented, and so on. Table 4 compares some burst-related signals for each processor. In many cases, you can implement the bus-conversion logic with some simple combinatorial logic and a few signal latches. In those cases, the biggest issue is timing. Using a good timing-analysis tool can save you many hours of engineering. Some designs may need to be more complicated, however. The example in the box uses several state machines. The key for the main state is to use the transfer-attribute signals to identify which type of access the 603 is performing. You can handle the single-beat access using some combinatorial logic, but the burst and two-beat accesses must use a second state machine that increments the addresses accordingly. Don't forget resets and interrupts Resets and interrupts are important signals that might need alteration, yet you might overlook them in a bus-translator design. Some processors, such as those in the 29K family, require the reset line to stay low for only a few clock cycles. Others, such as the 603, require reset to stay low for several hundred clock cycles. You should also pay attention to the state of all pins during a reset. You must also investigate interrupts and interrupt mapping when you design a translator for a new processor. Families and family members can have differing numbers of interrupt inputs. Also, some processors have both synchronous and asynchronous interrupts. Having both types means that, in some cases, you need to ensure meeting the setup times of the interrupt inputs. These hardware and software issues only skim the surface of what selecting a new processor involves. The myriad family members that exist for each architecture make the selection process more difficult. The combinations and permutations are seemingly endless. Yet, changing to a new processor does not have to be an engineering nightmare. Although there are no blanket solutions, you can start your investigations with some of the issues raised here. Armed with a proper implementation plan and a good understanding of the relevant issues, you can change RISC architectures and still retain most of what you already have designed. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EDN Access | Feedback | Table of Contents | |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Copyright © 1995 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||