Efficient coding for ARM platforms
I am sure many of you will be familiar with Donald Knuth’s oft-quoted sentiment:
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”How many of you have seen the second sentence?
“Yet we should not pass up our opportunities in that critical 3%.”Both parts are important. We all realize that optimization is a valuable activity but it only reaches maximum payback for minimum effort when we apply the available effort carefully in exactly the right place.
The business of optimization
When writing, modifying, testing, debugging and finally optimizing code, the coding standard rules in the vast majority of cases. However, any sensible coding standard will contain sufficient loopholes to allow you to choose performance over, say, readability in critical cases. It is identifying those critical cases where we must first spend time.
Never forget the 90/10 rule, which says that 90% of execution time is spent in 10% of the code. Before you start looking over the code with your optimization spectacles on, you need to spend significant time identifying that 10%. Time spent here is truly well spent. Profilers and other tools are invaluable here and can help pinpoint the trouble spots very quickly. But, as Rob Pike says:
“Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you’ve proven that’s where the bottleneck is.”
A many-dimensional problem
Coding is an activity which operates within an interlinked set of constraints.
The items around the outside (robustness, performance, security etc.) are the issues which we have to care about in our program, the items surrounding the code are the constraints within which we must operate. We must work within a given language, on the platform we have been provided with by our esteemed hardware- designing colleagues, using prescribed tools etc.
The rest of this paper concentrates on the four constraints in the diagram: Language, Hardware, Tools and Platform.
Remember that a short program (in terms of lines of code) is not necessarily a faster one. Writing complex expressions on a single line is no faster than a series of sub-expressions using temporary variables. In some cases it may be even slower. But, in almost all cases, it will be significantly less readable and maintainable. Favor readability over conciseness always.
Ambiguity and Flexibility
When trying to be clever, remember that the language has limitations and ambiguities, some of which are deliberate. The behavior of expressions involving variables of char type, for instance, depends on whether your tools (sometimes also the ABI you are using) specify that they are signed or unsigned. Be careful and be aware of the rules which apply in your case.
A more subtle case is behavior when trying to shift a variable by a number of bits larger than its size. The result depends on a combination of the compiler and the underlying hardware. Intel systems will generally shift by zero bits (leaving the value unchanged) while ARM systems will shift the value right out of storage item (leaving a zero result). The C language does not define the behavior, your environment does. The compiler will often warn about this but its ability to do so is limited when the shift distance is not a compile-time constant.
The C language is gloriously, wonderfully flexible. Let’s face it – this is the reason most of us love it so! But its very flexibility leads to some Achilles heels. Most notable is the concept of “pointer aliasing” in which the compiler must assume that any pointer may address any data item whose address is known. This restricts the freedom of the compiler to optimize effectively in many cases. You, as the programmer, can use your meta- knowledge about what the program is trying to do to make the compiler’s job easier.
Writing this instead…
…may not look so nice but makes it explicit that each of the input values need be loaded only once during the sequence. Many compilers support the C99 “restrict” keyword which allows the programmer to signal to the compiler that pointers do not reference other items. The use of temporary variables in cases like this, though, is more portable and may, therefore, be preferable.