Kintex-7, SEU mitigation using an isolated-design flow, part 1
Smaller transistors and lower-operating supply rails have reduced the critical charge necessary to generate an SEU and increased gate densities have made FPGAs more sensitive to multi-bit errors from a single ion strike. For many space applications, not all of the logic and routing resources are required, opening up the possibility of enhancing SEU mitigation by replicating and isolating redundant blocks within the fabric.
Although the reliability of FPGAs has improved considerably, Xilinx's Isolation-Design Flow (IDF) offers fault-tolerant implementation of FPGAs, confining errors to a single module thus preventing their propagation. IDF was developed to allow independent functions to operate on a single chip allowing for logical and physical separation of hierarchical designs as shown below:
If system failure can be made dependent on multiple, independent errors, reliability can be improved by many orders of magnitude. For example, a system comprising two redundant sub-systems in parallel has a probability of failure equal to the product of the probabilities of each malfunctioning. If both have a probability of failure of 10–9, then the system has a probability of failure of 10–9 × 10–9 = 10–18, many orders of magnitude lower than the sub-system failure rate. This assumes the failure probabilities are independent, i.e. the sub-systems do not have a single point of failure or a common failure mode.
The IDF implements routing constraints which constructs a 'fence' to isolate design modules. An Isolation Verification Tool, an independently implemented design rule checker, reports if a design partitioned into isolated regions meets the stringent standards for fail-safe design. IDF provides a transparent way to eliminate single points of failure between redundant modules within a single FPGA, however, fault-tolerant design must be considered at the system level.
To partition floor-planned logic, a fence (isolated region) is necessary to separate them. The fence is not specifically drawn as it is what is left over between blocks, i.e. unused tiles in which no routing or logic is present. Fences may comprise CLBs, BRAM, DSP, I/O blocks, GTX transceivers etc. as shown below:
The partitioned floor-plan of a single-chip, cryptographic design comprising two, redundant, AES-encryption modules and its subsequent implementation are shown below:
Figure 4 Here is the implemented, isolated design. (Image courtesy of Xilinx)
The next article in this series will demonstrate the use of Xilinx's Isolated-Design Flow within the latest release of Vivado IDE, 2015.1. The Soft-Error Mitigation (SEM) IP allows space users to assess the criticality of SEUs on the implemented design, with fault injection of configuration memory allowing users to characterise and manage SEFI behaviour. Logic partitioning offers the potential to protect against SEUs increasing device availability and overall system reliability.
On the subject of soft errors, one of our colleagues, Maxim Malyy, from the Quazar Space Company based in St. Petersburg, Russia, is considering using a highly-elliptical orbit to characterise the radiation sensitivity of components. The dose and nature of the Van-Allen belts is not fully understood and cyclotrons are not truly representative of the harsh environment of space. Quazar's complementary approach of using a 3U, CubeSat platform in HEO, together with redundancy and passive z-shielding to understand the radiation sensitivity of a payload, will improve knowledge for its customers as well as our industry. What are your views on Quazar's plans?
Until next month, stay SEE free!