Low-voltage tests uncover low-temperature IC problems
At the beginning of any chip design, engineers rarely discuss reliability and operation at the extremes of the device's temperature range. If parts have not been properly characterized, failures can occur once the devices go into high-volume production. Your customers will start to see problem devices and RMAs (return material authorizations) will commence. Here's how we can now screen NVM (non-volatile memory) failures that only occurred at subfreezing temperatures. We'll explain how to use low power-supply voltages to temperature screen devices for cold temperatures, which lets us test at room temperature.
During production of a device that had NVM installed, we discovered an issue where a few ppm (parts per million) of the devices were failing memory retrieval at temperatures around −40°C. As this was an automotive device, reliability was a prime concern.
In production, these devices were originally tested at the minimum supply voltage (2.2V) and there were a few ppm of fallout at 35°C. Because we had an automotive IC, we had to screen the devices and prevent bad ones from getting into the field.
Clearly, the initial step was to introduce a minus temperature screen for this product. The result was immediate success at identifying the problem parts. Figure 1 shows failed devices at −40°C.
Figure 1 Some devices failed when tested at −40°C.
After running a few lots, the customer reported a significant issue relating to percentage failures of parts on their line. After running some experiments, we found by dropping the voltage to 2.0V, we could find a process issue that was being missed at 35°C. The reticle issue is illustrated in Figure 2.
Figure 2 The red dots show OTP failures at 35°C at 2V.
We discovered that the reduction in voltage seemed to stress the part and cause a failure that normally wouldn't be seen at room temperature, but becomes apparent at low temperatures. The question then became, "Why?"
In this case, production ran quite well with no major issues except that operations now had the issue that yield was low for some lots. See Figure 3 for an example of a low yielding wafer.
Figure 3 A low-yielding wafer has many failed devices, shown in red.
We were then faced with a tradeoff between yield and reliability. But, how could we prove which is correct? In this example, we have proven flow at minus temperature where we have caught failing devices. So, we can do a full characterization with different voltages to see which will capture the most failures with the least amount of escapes, hence correlating to the minus temperature screen.
For good wafers, which had no yield major issues, we found a good correlation by powering the devices at 1.8 V. Figure 4 shows a test escape, a wafer with die that passed at room temperature but failed at −40°C.
Figure 4 We identifed a bad die, failing at −40°C, but it passed test at room temperature (test escape at room temperature).
By lowering the test voltage to 1.8V, we found dies that also failed at −40°C, as shown in Figure 5.
Figure 5 Same die from Figure 4 had failures at at lower voltage (1.8V) at room temperature also failed at −40°C.
Figure 6 shows the results of raising the test voltage to 1.825V. The device passed. That told us that testing even at 1.825V would result in test escapes.
Figure 6 At 1.825V, the device passed, resulting in a test escape.
When we had a particularly bad lot, the yield at 35°C captured more failures than failed at −40°C. We raised the test voltage back to 2.2 . See figures 7, 8, and 9.
On top of this, to ensure good reliability, we had to take out the nearest neighbors due to yield risk so there is a kind of double filtering effect. Hence we needed to check the suspected nearest neighborhood failures.
Figure 7 This wafer map shows multiple −40°C failures at 2.2V.
Figure 8 Failures at −40°C occurred while doing a full wafer sort at 2.2V.
Figure 9 Failures at 35°C while doing a full wafer sort at 1.8V.
Having built up these suspected bad dies, we tested them and checked for reliability. These were actually shown to be bad parts and the effort of doing the electronic inking (Figure 10) is worthwhile. Now we have a correlated and proven method for capturing these devices with low voltage testing, giving us a reliable way of screening all bad devices. (Electronic inking is a manual post-processing step of a wafer map where you electronically select dies you don’t want to go into the field. It’s the same as inking using a prober to inject ink out areas. Its electronic as the wafermaps need to be sent to the packaging houses that use the wafer map to select the dies for assembly. Die shown in yellow are electronically inked.)
Figure 10 The new electronic inking strategy helped us regain the needed yield.
After some investigation, we found the root cause of the fuse failures was that the n-well of the mask layout. It didn't have a large enough process window to allow for marginal mask alignment issues in the fabrication process that led to the step field issues seen in production. This was causing issues with the oxide breakdown, resulting in resistive bridges rather than shorts for programmed fuses. By increasing the n-well to active area, we attained more tolerance for mask misalignment, thereby eliminating the issue.
Once wafers were received with the modified layout, the wafers were subjected to the standard flow that had been developed to identify the issues at minus temperature. Figure 11 shows the results from the cold screen of a wafer with the modified process. All traces of a step field are gone and that the failures are now randomly distributed, as expected.
Figure 11 The −40°C probe with new n-well designed OTP cell tested at 2.2V resulted in randomly distributed failures.
Although the results from the cold screen are very good, it would be ideal if the cold screen could be skipped completely, hence making a significant reduction in production costs. It can be seen that all the cold failures are covered by the very low supply voltage screen. These dies have been identified in Figure 12 by a blue circle for ease of identification.
Figure 12 All failures identified at minus temperatures are now also captured at room temperature.
We concluded that the cold screen can now be considered for complete removal as the very low supply voltage screen captures all the cold failures at 35°C. We can only do this, however, after testing multiple wafer lots with no failures during the cold screen.
Figure 13 shows a stacked wafer map for 30 lots of 25 wafers each. The low temp OTP fails are randomly distributed across the wafers. Based on the fails at −40°C there is a 3 ppm escape rate from the room temperature screen.
Figure 13 A stacked wafer map shows ramdomly distributed failures from 750 wafers.
Using a very low supply voltage lets us screen for parameters sensitive to cold temperature, a reliable and most effective technique can be developed to remove such problem devices resulting either from process issues or from randomly expected defects. In combination with an intelligent nearest neighbor removal, users of this technique can have a high degree of confidence in shipping high quality and reliable dies even when suspect wafers have been produced.