Mentor announces synthesis for rad-tolerant design in FPGAs
There is almost a love-hate relationship between designers of radiation-tolerant and high-reliability systems and the vendors of FPGAs. On one hand, these systems are often rich in digital logic and low in production volume-the perfect sweet spot for FPGA use. On the other hand, FPGAs present a nightmare scenario for reliability experts. The chips are far more complex than the designs they implement. Critical parts of the devices cannot be implemented in a redundant fashion. The core voltages are typically as low as the silicon process will allow. And a single-event upset can alter not just the state, but the topology of a circuit.
Still, anyone who knows humans will predict that most often the designers will win out over the reliability experts. And so over the last few years a subculture has grown up around the use of FPGAs in moderate space conditions-from low-earth orbit satellites to Mars Rovers-and in high-reliability applications-such as human-safety-critical control systems.
There have been two schools of thought within the subculture, in the view of Mentor Graphics product line director for FPGA synthesis Daniel Platzker. One school, exemplified by Actel’s rad-tolerant FPGAs, takes a hardware approach, employing hardened processes and using upset-resistant circuit designs for programming cells, SRAM bits, and registers. Even the two big guys are to some extent attendees to this school. While the primary concerns for Altera and Xilinx have to be density and power-and to a decreasing extent speed-they also spend a lot of time on designing upset resistance into their configuration SRAM cells.
The second school takes the silicon as the vendors provide it, and attempts to create logic designs that are inherently resistant to upset. This school uses continuous or periodic scrubbing of configuration memory to detect and correct configuration upsets. It employs failure-resistant state machines and triple-module redundancy (TMR) to prevent upsets or transients in signal paths from propagating into system errors. And this school does the vast majority of its work by manually coding these special structures in RTL.
That task is time-consuming and error-prone. And even after the RTL structures are in place and correct, the designers still have to protect them from synthesis and mapping optimizations that will joyfully take the structures apart, rendering them useless. It’s a huge task, and it can deprive the design of useful optimizations.
With these issues in mind, NASA began working with Mentor’s Precision Synthesis team to address the problem. Instead of fighting the synthesis tool, they reasoned, why not create a synthesis tool that understands and generates upset-tolerant structures? The result of the project is Precision Rad-Tolerant.
Platzker explains that two services distinguish the new tool from previous synthesis engines. One is a safe state-machine compiler. This package can generate two types of state machines. The first uses two parity bits to detect an incorrect state transition-even an incorrect transition to a valid state-and to drive itself to a neutral state from which the system can do error recovery. Unlike some previous tools, the parity state machine generator is not limited to one-hot configurations. The second option is a fully fault-tolerant stat machine that employs Hamming distance-three coding to correct a single upset on the fly, without loss of timing.
The second synthesis service automatically generates TMR on a specified portion of the design. The user has three options. “Local” redundancy generates only triple-redundant registers. “Distributed” redundancy generates three sets of registers and of the logic cones that feed them. And “Global TMR” adds in triple redundancy on clock and reset nets. In all three modes the synthesis tool also generates the voting circuitry that brings the three redundant paths back together.
Precision Rad-Tolerant should alleviate much of the drudgery and opportunity for error in creating upset-tolerant designs for FPGAs. It can not, however, help the user select an overall safe architecture, select the best techniques for individual blocks, or shepherd the resulting netlist safely through the gauntlet of the FPGA vendor’s mapping and routing tools. That responsibility remains with the high-rel design experts. With the requirements for upset-tolerant design spreading rapidly from its roots in space-based and mil-aero systems to transportation, medical equipment, and networking, these designers look to become seriously sought-after.
Leyla commented:
Cuda is very much like OpenMP. A data parallel veisron of OpenMP.FPGAs issues are cost and development tools. Costs can't be worked around, their pricing model is not set up for HPC. It is still organized on the principle of value, and not on the concept of volume. HPC is moving rapidly in the volume direction.Also, at the end of the year, we are going to have 1 x 10^8 Cuda enabled GPUs shipped, likely more than that. I just don't see how FPGAs will be able to overcome a lead like this. Even achieve parity with a lead like this. Even 1/10th or 1/100th a lead like this.The architectural and other benefits won't matter much to this sort of head start. GPUs are winning in designs, as developing on them is not very hard, and your code can run right away on your laptop, your desktop, and your cluster (with Cuda GPUs). FPGAs don't have the concept of a portable bitfile. So I can't take something architected for an Alphadata card and move it to a Nallatech or to another vendors card.Don't get me wrong, GPUs aren't good for everything, but they are pretty good for a range of apps which are interesting to ISVs and end users.The tools you are working on should be independent of the underlying accelerators, and we are looking forward to working with them. But for other coders, I am seeing (and have been for the past month or two) a massive surge in GPU interest. I am not convinced it is a fad.















