A comparison of space-grade FPGAs, Part 3
Non-volatile FPGAs offer the space industry many advantages regarding the integrity of the configuration data. In fact, most FPGAs in-orbit today are antifuse, eliminating the need for external boot memory, reducing the number of parts on the BOM, PCB space, power-supply requirements, design effort, while improving the overall reliability of a system.
For many missions, non-volatile, flash FPGAs represent the optimum digital solution for space applications combining all of the above advantages with the ability to be re-programmed during system prototyping and in-orbit.
SRAM-based FPGAs have exploited the advantages offered by continued CMOS scaling and their leading performance, increased logic density, large I/O count, as well as the range of resources offered by the latest parts enable new processing opportunities and missions. The ability to re-program devices during system development and also in-orbit has resulted in SRAM-based FPGAs being used on key programs, e.g., the LEO, Iridium, telecommunication constellation and NASA’s Mars rovers.
The volatile, configuration memory used by SRAM-based FPGAs is used to store both the user-defined, functional logic as well as the programmable routing. For the latest deep-submicron parts, configuration represents the largest single storage within devices and for space-grade FPGAs, cells can be sensitive to cosmic radiation potentially leading to a non-intended change of functionality.
Typically only a percentage of the configuration bits are actually used to implement a design and an SEU strike flipping an unused bit will not have any effect on functionality. However, in some cases, a change to an essential bit could be catastrophic leading to complete system failure. Although the soft-error rate is low, for high-reliability applications where continuous availability of service is mandatory, SEU mitigation of the configuration memory will be required to meet a mission’s reliability needs.
To protect against modification to an SRAM FPGA’s configuration, manufacturers have developed novel methods to protect, monitor and scrub this memory. One such technique has been to harden cells by exploiting process and layout features which have made registers 1000 times less sensitive compared to standard storage elements used on equivalent commercial parts. The associated configuration-control logic has also been triplicated reducing upset rates to less than one every 10,000 years.
Space-grade, SRAM-based FPGAs also interleave the layout of the configuration memory to ensure physically-adjacent errors from a single strike are separated in the logical memory map and appear as single or multiple-bit upsets. For the latest, high-density FPGAs, these are detected by reading back the configuration, however, if physically-adjacent, upset cells are distributed in different areas of memory, the SEU will appear as single or multiple upsets in more than one logical unit.
Not all configuration memory is critical to FPGA operation, typically 20 to 40%, and another mitigation technique classifies regions within the design as either essential or non-essential for operation. Soft-error detection and correction focuses only on those bits necessary to implement the intended functionality.
Partitioning the configuration memory into essential and non-essential bits increases the availability and reliability of a design as the system no longer has to treat every upset as a critical failure. Proprietary IP corrects the error and also determines whether the SEU has affected any dependant resources.
Some space-grade, SRAM-based FPGAs calculate a CRC value for each frame as the configuration bit stream is generated. As the design is loaded into the FPGA, hardened logic within the device computes the CRC for each frame comparing this with the corresponding value embedded within the bit stream. If there is a mismatch, an error is flagged, whose location can be extracted from the CRC data and then corrected.
During normal operation, the same error detection runs continuously comparing the CRC value for each configuration frame with the corresponding value contained within the bit stream. This monitoring process takes place in the background without interrupting normal operation and scrubbing corrects any upsets without having to reconfigure the FPGA. The user can enable and control the frequency of the error-detection algorithm, and experience suggests that this capability should be offered on all space-grade, SRAM-based FPGAs for missions that require continuous availability, e.g., revenue-generating, telecommunication satellites.
In my two previous blogs in this series, I compared the performance of a 20-stage LFSR implemented on the various, space-grade FPGAs with the resulting, gate-level netlists comprising twenty flip-flops and one two-input XOR gate. For SRAM-based devices, if the feedback routing from the tapped registers or the selection of the XOR gate are modified due to an unintentional change in the configuration, the 220-1, pseudo-random sequence would no longer be generated.
To assess the effectiveness of the various mitigation techniques, some manufacturers of SRAM-based FPGAs provide a fault-injection capability enabling users to create an upset and deliberately place errors within the configuration memory. Single, adjacent-double and multi-bit errors can be introduced to characterise potential SEFI behaviour, allowing designers to understand the impact of changes to device configuration, plan and validate an FPGA or system-recovery response, and calculate soft-error rates. In terms of logistics and cost, SEU fault injection is a significantly cheaper option than accelerated-beam testing at a cyclotron.
One manufacturer generates a map of the configuration memory that defines the upset sensitivity of various regions within the intended logic. During an SEU, proprietary IP searches the map to determine if the affected bit is essential to operation as show below:
Figure 3: Sensitivity map of configuration memory.
Ultimately for systems that contain SRAM-based FPGAs, the type of configuration-memory mitigation that you use will depend on the overall mission reliability and availability requirements. An external, scrubbing FPGA is not always necessary!
Future articles in this series will compare the overall systems performance of space-grade FPGAs, and for SRAM-based devices, will include the percentage of configuration memory essential for operation. Until next month, enjoy the summer break and don't forget to protect your critical bits!