Managing the 8- to 32-bit processor migration
Back when I started in electronics, working on discrete, 4-bit processors, I couldn’t have known I would one day have to worry about how big an integer was or discuss processors in a Gulliver’s Travels context. As geometries shrank and prices dwindled, however, there was a great migration of applications from 8- to 16- and then to 32-bit processors. Along the way, tools evolved to bring code generation and application development to new levels of efficiency—generating more headaches in the process.
The problem had its genesis in the engineers working with the first microcontrollers who assumed 16 bits for an integer would be “good enough.” Indeed, the early mainframe and minicomputer architectures differed in word length as well as in bit and byte ordering; the number of bits in an integer related to the architecture’s word length and varied from machine to machine.
With apologies to Jonathan Swift (Reference 1), engineers have revised the Lilliputians’ argument to debate which end of a number—the largest (big-endian) or smallest (little-endian)—should come first in memory. There are valid arguments on both sides of the “endianness,” or byte-order, debate (Reference 2), but this article focuses on the ramifications for developing applications using C code.
An engineer embarking on a project using a 32-bit, little-endian processor, for example, might dismiss discussions of native processor size and endianness as merely academic for a new design. Then, once the project is under way, the engineer might discover that:
- The company has code intellectual property that is “tried and true” but has never been tested on a 32-bit, little-endian processor and must be recompiled from C source code;
- The code IP must be reusable by another internal group;
- The little-endian processor will pass data structures over the communications link to a big-endian processor; and
- The project must also use a 16-bit, big-endian memory-mapped device that the company’s ASIC group has provided.
Any of those scenarios can cause problems if the engineer does not keep in mind the native size and endianness of the processor.
Consider what happens when our hypothetical engineer is given code that appears to be a very simple structure for the calendar portion of a real-time clock:
The code was built without yielding errors, but neither does it yield the expected results. The structure is so simple; what could have gone wrong? The engineer looks at the variables in the watch window, which shows the following:
It turns out the code had only been used on 16-bit processors up to this point. The first problem is that ANSI C does not define the number of bits in an integer; that number is typically related to the native size of the accumulator in the processor.
Thus, the first point is to write your code so that the size
of the variables is known. The C99 standard addresses this
issue through fixed-width integer support (Reference 5). So,
our hypothetical engineer plugs in his solution to correct the
variable size (he happens to use the C99 conventions, so if
his compiler were C99-compliant he could have just included
Endianness and portability
The engineer then uses those definitions throughout the “tested” IP and runs the code. Now the watch window, which should show Oct 22, 2012, instead shows the following:
The 16-bit processors on which the code had been used earlier were big-endian; thus, the initialization string is being put into memory in the wrong order. Our engineer, bitten once, revisits the four bullet points and vows to solve all his problems by supporting both big- and little-endian structures and making the code portable.
That decision, however, has its own implications: Efforts to make code portable between machines of different endianness can uncover further, more obscure issues.
A common practice when coding is dereferencing with a pointer and casting at the same time. Our engineer writes some new code and passes it to a colleague on another project (recall the second bullet point above); he remembers to use his new definitions, in case the other project is using a 16-bit machine:
The colleague from the other group soon comes back in a huff and tells the engineer the code doesn’t work. The watch window shows a value of 0×12 for c_data, when clearly the lower byte of the data is 0×34.
The problem now is that the combination of casting and dereferencing causes the pointer to look at the wrong byte when used on a big-endian processor. The take-away is to use casting with caution, especially when using pointers to reference the data. There are a number of ways to skin this cat; Listing 1 shows our engineer’s solution.