Subscribe to EDN

Intel researchers suggest new approach to error detection in software execution

September 27, 2007

The problem of detecting when execution of code is not following the path you had in mind has been an issue since the dawn of the electronic computer. In the really good old days—please don’t ask me why I know this first-hand—the Burroughs 205 computer, a mighty colossus with rotating drum memory, vacuum-tube logic and 9-track tape for on-line storage, had a CRT on the operator’s console. The address bus for the drum memory was split into high- and low-order almost-bytes, and each byte was fed through a DAC to the deflection amplifier on an axis of the CRT. The result was a striking visual display of the sequence of addresses fetched by the executing program—a trace, drawn out in two dimensions. Skilled operators could watch the CRT and tell at once if a familiar program—often the Algol compiler was the culprit—had wandered into never-never land.

The problem is somewhat more challenging for today’s multiprocessor server chips. With clocks in the GHz instead of the kHz, and with the potential for several processors working simultaneously on closely-coupled threads, just discovering that something has gone wrong can be a serious undertaking, even in retrospect. Finding out in time to prevent damage is even harder.

And the stakes are getting higher. In network servers, for example, a deviation from expected execution most likely means an attack from an intruder has succeeded, and the next thing that will happen will be the planting of a Trojan Horse or other pernicious act. Prompt response is vital.

In the near future, the problem will be just as critical in the embedded world, as multiprocessing begins to replace dedicated hardware accelerators in mission-critical applications such as engine management, vehicle safety and attitude control, and robotics. (At this point I may want to stop driving, but that’s another story.) Here, a deviation in the expected trajectory of the software is likely due either to a sensor out of limits or a bug. In either case, the result is likely to be an undesired trajectory for a very fast, heavy piece of equipment in near proximity to humans. Again, early intervention is vital.

Such thoughts have caused two researchers at Intel, software engineer Michael Ryan and research scientist Shimin Chen, to suggest a novel way of detecting these excursions. The two described and demonstrated their concept at the Intel Developers’ Forum last week.

The idea is not unlike—again reaching into uncomfortably distant history—what in-circuit emulators used to do in the early days of microprocessors. Ryan and Chen propose inserting a hardware recording and monitoring circuit into the instruction retirement engine of a CPU: the point at which the instruction’s decoded op-code and effective addresses are most readily available. The unit would record each op-code and address, compress this data using an on-the-fly lossless decompression algorithm, and transmit it to a FIFO-like execution-log memory.

A task on a second CPU then traverses the log, perhaps with its own hardware assistance, and uses something not unlike a routing table to determine whether execution is remaining within expected limits. Ideally, this inspector would be able to continuously test the log contents against a set of assertions designed to detect errors before they could have consequences. If something does go wrong, the second CPU can use the log information to in effect rewind the failed task to before the error, repair damage and start over.

At this point the work is being done only in simulation, and there has not been a great deal of work on the implications for hard real-time systems, just servers. But the researchers are already doing feasibility studies on adding the necessary hardware to an Intel CPU and trying it out in reality. Given the growing importance of the problem, it might be a very good use of a little silicon real estate.

Posted by Ron Wilson on September 27, 2007 | Comments (1)

September 27, 2007
In response to: Intel researchers suggest new approach to error detection in software execution
Bing Huang commented:

Many years ago as a hobbyist I wrote a x86 simulator/emulator on a x86 system. For every instruction it record the opcode, its address, and the CPU state onto a file. With that it is possible to determine exactly what happpen when a program execute. It took about 100 instructions to emulate ONE target instruction and generated lots of output. The target instruction is actually execute by the CPU and not interpreted. The extra instructions are use for recording and house keeping. It is important to save the CPU state (e.g. flags, etc.) as an opcode stream itself does not tell the whole story.

POST A COMMENT
Display Name
captcha

Before submitting this form, please type the characters displayed above. Note the letters are case sensitive:

Advertisement
Advertisement
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows