Code optimization tips for 8-bit microcontrollers
You are the project leader for a highly successful consumer product. One day you are asked to upgrade all of the devices in the field to add the latest "must have" feature. However, when you attempt to compile the new code you see the dreaded message "Program Memory Overflow". You've already turned on all of the optimizations in the compiler to make the last upgrade fit. Now you're stuck. Here are optimization techniques that have saved designers up to 10% of their code size, allowing new features and bug fixes to fit into program memory that is always just a bit too small.
Many programmers learn to write software on 32-bit processors, like the Intel Pentium or one of the ARM platforms. A different mindset, however, is required to work in the embedded world. On a 32-bit CPU, the best way to store a bit is often to use a 32-bit variable. On an 8-bit, the best way is to use a single byte. Some processors, like enhanced 8051s, may have special 1-bit variables.
Embedded processors often go beyond the standard Harvard architecture by separating memory into different memory spaces, some overlapping, some discrete. Some common spaces are in an 8051, for example, are CODE, XDATA, DATA, IDATA, BIT, and registers. It is important to learn the strengths and weaknesses of each memory space when deciding where to place variables, especially as each space is limited in size. The IDATA space, for example, may only run 256 bytes but it is optimized for indirect access. The DATA space may also only run 256 bytes and it also contains the BIT addressable space and the registers. While CODE and XDATA are only accessible via slow indirect access mechanisms, they can address up to 64K.
Many of the compilers for 8-bit CPUs contain great optimizers. However, these optimizers have limits. Simplify expressions when you can. For example, the executable code for Figure 1a will be larger than the executable code for Figure 1b because the compiler can combine the two constants into a single expression.
Experienced carpenters know to "measure twice, cut once". Embedded firmware engineers should follow the same principle. All embedded compilers will provide a map file that can provide useful insights (Figure 2). The example map file provides useful information about the sample code used for this article. It shows that the library (LIB_CODE) is using over 1-kbyte of space. It also shows that the startup code (c51startup) is using over 140 bytes of code.
Another reason to optimize is to save processing time. Here it is even more important to measure your program's performance before attempting optimization. While it's easy to see that a large source file is likely to consume a lot of memory, it's difficult to determine which critical sections of code are the ones burning precious MIPS. Profiling is an important tool during this process. You can perform profiling with a single, unused output pin, but it is easier with more outputs. Create a macro that sets your profiling outputs and place this macro at the beginning and end of each routine (Figure 3).
Know what you're paying for
In the map file in Figure 2, we saw that the library was using 1-kbyte of our precious memory. Looking deeper into the map file and a little Excel magic gives us the results shown in Figure 4. The smaller library functions were removed from the graph. Although the function names are somewhat cryptic, you can look them up in the library reference. The first library function, ULDIV, is unsigned long division, and the second library function of interest in the chart is long multiplication.
The cross-reference in the .map file shows that we got lucky: these two functions are only used in one file. The .lst file shows two uses of the long division function and one use of the long multiplication function (). In this particular case, we know that the number of zones is a binary number and the other two values are constants. We can therefore replace the long multiply with a left shift that gets repeated up to 8 times (). Even though this routine is fairly large, it still reduces our library usage and overall code size.
Mature 8-bit compilers contain well-written, optimized library functions. However, these functions must allow for corner cases that you may be able to rule out by your knowledge of the data. The biggest library function shown in our map file is a prime example of this. The ULDIV routine is called twice to obtain the divisor and remainder of an input value divided by a constant (). Because we know more than the compiler about the expected values here, we can convince the compiler to eliminate the expensive long-division function and replace it with a lighter-weight 16-bit version. If you are an aggressive optimizer, you might even want to implement your own binary long division routine.
Passing arguments to functions is good coding practice. In C programs the compiler allows you to be absolutely certain that the subroutine you call will not modify the arguments. It allows the compiler to handle memory management for you. However, this costs time and space that you may not be able to afford. Consider this code fragment (Figure 7). Because our variable is declared in main(), the only real difference between it and a true global variable is namespace. However, every time foo() is called, the compiler must store effectiveGlobal in a new location. Declaring a true global saves the code and data overhead consumed by the call.
Tell the compiler as much as you can
The 8051 has several memory spaces: XDATA is a 64K address space, IDATA is a 256-byte stack and indirectly addressable space, and DATA is a 256-byte directly addressable space. In most cases, the code author will know to which memory space a pointer addresses. If the memory space is specified by the user, the compiler doesn't have to include code to address all three types of memory in a routine; it can just use one. This also saves data space since the pointer doesn't have to include data space information.
In my 8051 compiler, these variables are accessed by library routines containing the string OPTR. Searching my listing and library files for OPTR references revealed many long variable uses, some of them causing bugs due to assumptions in the code about pointer sizes.
Using the const keyword in variable declarations gives you two optimizations: The compiler doesn't have to store the initialization value for the variable, and the compiler can perform some math at compile time rather than at execution time. Make sure to check the compiler's output for a sample program to see if it truly treats const the same as #define. I test my code using the following declarations and code (Figure 8). It produces the following output, which tells me that it is not aware of the values of const variables.
Many embedded firmware engineers swear that they can always do a better job than the compiler, and that they should just recode everything in assembly language. However, anyone working in professional software or firmware development can pick up a C program and read it. If you need to pass the code to another developer, they can start changing it without learning all of the tricks required to get maximum efficiency from your particular assembly language. One of the original purposes of C was to provide a language that was abstract enough that it could be used on many different processors. This goal is still important today.
Additionally, modern compilers provide several features that even the odds with the humans. For examples, some 8-bit processors do not have an efficient mechanism to access variables on the stack. The common solution to this is to create a call tree and share variables between functions that don't call each other. Maintaining this structure in an assembly program is difficult and error-prone.
Many 8-bit compilers include the ability to optimize after link time. This allows the compiler to perform many optimizations that humans can do, but several that they cannot. For example, many compilers now search for strings of code that are common in different functions and combine them into a new function. It's impossible for a human to remember all of the detail required to perform this function on every compile cycle.
In spite of all these reasons, assembly language still has its place. Make sure to consider all of the factors above before you resort to using it. In the course of writing this article, I reduced the size of a mature program from over 0x6000 bytes to 0x5f2b bytes, a savings of over 200 bytes. This program has been the target of numerous size reduction efforts in the past. Hopefully these tips will help you achieve even greater savings!