Sweating blood over hardware-interface routines
In the 1980s, I obtained a number of consulting contracts to put the Vinten aerial photo camera and various other sensing devices into fixed-wing and helicopter aircraft. Along the way, I developed a computer-control system for the cameras that would adjust the trigger interval to obtain the desired overlap from frame to frame. It used a Commodore 64 with modified video-overlay circuitry.
One of these jobs was for a government agency in Edmonton, AB, Canada. There was a lot to do. The cameras and sensors went into a pod on the bottom of a Jet-Ranger helicopter. A rack of control equipment mounted on seat rails in the cockpit. The entire system was dual-powered from 117V-ac or large gel-cell batteries.
The system came together in the workshop just days before we were due to ship it. All the subsystems worked correctly, so we moved on to operating the complete system. It ran flawlessly for an hour or so, the cameras triggering correctly every few seconds. Then, in one of those heart-stopping moments that every engineer has experienced, an anomaly: The cameras fired off a burst of frames at high speed and then returned to the original interval.
The Vinten camera draws 20A for a few milliseconds every time it starts, so it's a major source of electrical noise. I had carefully designed the electronic interface between the computer and the cameras to minimize the effect but, I thought, maybe not carefully enough. The computer interface was an input/output shift-register pair attached to the parallel port. The shift-register parallel connections drove the camera relays, read back the radar altimeter, recorded video ground speed, and controlled other devices. I spent days peering into my scope, looking for noise spikes. Nothing showed. I installed various noise-control circuits, but the problem recurred at random.
Delivery was now late. The client insisted that the hardware appear in Edmonton. Reluctantly, I advised him of the problem—promising that I'd find it eventually—and shipped everything off. A week passed, and I still had no idea where to look.
The time came to fly to Edmonton. I packed every diagnostic tool I owned and headed out to the airport for a morning flight. Then, as we were watching the in-flight movie—Back to the Future—it came to me.
In Edmonton, the client was not happy. I sat down with the equipment, typed in two lines of assembly code, and held my breath. The system operated perfectly.
The problem: Two software routines were accessing the interface hardware. One routine, to read the ground speed, was part of the main loop. Another routine accessed the interface as part of the 60-Hz timing interrupt. Normally, these routines didn't get in each other's way. However, if the main-loop routine was reading the ground speed when a 60-Hz interrupt occurred, it would resume with corrupted ground-speed data, which then misadjusted the camera interval. The solution: Two instructions to disable the 60-Hz interrupt while reading ground speed.
This experience taught me that it's asking for trouble to access hardware from more than one point in the program. There should be a single interface routine that controls access to the hardware. Furthermore, it's important to realize that any section of the code can be interrupted and may corrupt the system state. To prevent this situation, you need to disable and enable interrupts or save additional state as part of the interrupt-service routine. This type of problem is nasty to debug, because it occurs at random intervals. So, you must properly engineer the system.
Testing and test equipment don't always help. As the saying goes, sometimes you have to think about it until little drops of blood appear on your forehead; then, it will come to you.