Disappearing data

-August 13, 2013

My boss once told me that finding and fixing someone else's software bugs is worse than fixing your own. He was right.
Years later, I was working as a project engineer for another company. I had my own developing project and very little to do with the old products that were well settled in production. One day a customer called and said a meter he just bought was not working. The needle won't move. Older units he had from years before were still fine. And then another customer called, and another with the same problem.

Since these units were powered by a 9V battery, we asked the customers to replace the battery. Nothing. A few other attempts to "press this" and then "press that" also failed, so we had the meters sent back.
Initial evaluation showed that the customers were right; the meters were not working. Upon recalibration though, they worked just fine. Although production records showed the units had been originally calibrated, it was decided that it was just a quality control problem. The units were all recalibrated and shipped back to the customers.
But then about half of the customers complained again about the same problem. Now it was obvious that somehow the calibration data was lost. And that's when engineering got involved and my problems started.
The bad news was that the product had been designed years ago by an external consultant and little data or tools were available. I had the code, but no emulator or development system, so how would I debug? With just a needle indicator, no data communication, and no display, there was no way to read the calibration data, and no way to put in a breakpoint and check the trace buffer.
Luckily, there was an LED on board. My first guess was that the calibration routine ran randomly and overwrote my good data, so I modified the code to have the LED blinking only when the Write_to_EEprom() function was executed.

I programmed a unit and started to play with it. I let it soak at +55°C and then at 40°C. I let it sit overnight in a humidity room. I gently dropped it. I pressed the buttons in crazy combinations and still no accidental calibration. I then removed the battery and reinserted it in the slot.

Eureka! My red LED started to blink. And sure enough, after I pressed the "measure" button, nothing happened. All my calibration data was gone, and my experiment was repeatable.
What happened? The phenomenon was relatively recent, only after the original (obsolete) 9V connector was replaced with a different one. The original connector would have you connect one polarity first, and then the other, so the connection to the power was “clean.” The new one was rigid and had both + and  connect at the same time, which would sometime require several tries to get it all the way in.

The circuitry was designed in such a way that Vdd was applied to the micro as soon as the battery was connected and no delay circuit was designed in to allow for voltage to settle. Nor was there an “ON” button to enable the power to the microcontroller. The voltage bounce generated as the battery was plugged into the new connector would drive the micro crazy in absence of a brown-out detector, and would sometimes jump to the save_calibration() routine, overwriting data with zeros.
Now that I knew what the problem was, the next step was to solve it. Any hardware change was out of the question; time was critical, production was on hold, and the company could not afford the time or cost to redesign the board. So it must be software.

I defined a new variable do_the_calibration, initialized with zero. The variable was set only when a "valid" calibration was initiated and not all the calibration data was zero. In my Write_to_EEprom () routine, I checked the variable right before the "Write" command was to be sent to the eeprom device, and allowed the write only if do_the_calibration was set. After the write, the variable was cleared again. With this new firmware, the meter would still attempt to write the eeprom when the battery was connected, but the protection was there and never allowed it.
So simple, right?
Mihaela Costin is a senior electrical engineer at Curtis Instruments.

Also see

Loading comments...

Write a Comment

To comment please Log In