First-silicon encounters
Frank Sauk, Presto Engineering Inc - December 9, 2010
New-product
ramp-up to production is sometimes smooth and easy, with all that is promised
by "right first time" design methodologies. But sometimes first silicon comes
back from the foundry with issues. Obviously, it's critical to characterize,
isolate, and debug the device to identify the root cause or causes and fix the
problems in a timely manner.
Four of the most common problems with first silicon are scan-chain failures, excess leakage current, signal integrity, and timing. Presto Engineering has developed techniques and procedures and acquired specialized equipment-ATE, laser-scanning microscopes, e-beams, and FIBs (focused ion beams)-to isolate issues and allow its customers to move forward quickly with their product releases.
General process flow
The analytical flow for getting at the root cause of an issue proceeds from fault isolation to localization and, finally, to diagnosis (Figure 1). Fault isolation is primarily a test activity including both functional test results from ATE and testing designed specifically to identify the node or nodes of interest. Localization proceeds from this identification using a variety of techniques to determine the physical location of the fault. Diagnosis then attempts to relate the fault to some aspect of the circuit design or process performance.

Sample
preparation requirements vary greatly and depend on the techniques used in a
particular analysis. Some of these require the package to be opened to expose
the die for further examination. In most cases it is important to preserve the
electrical integrity of the opened package so the circuit can be observed in
operation. Two industry trends have affected analytical access to the circuit:
the proliferation of metal interconnect layers and the use of flip-chip/CSP (chip-scale
packaging).
Any failure-analysis technique requires either decapping the sample, if it is a wire-bond package, or sample preparation for a flip-chip/CSP. Sample preparation takes a couple of hours, including thinning the sample to 50 to 200 microns. It's similar in complexity and risk to decapping a part. You can almost count on all of the units working without any damage after that preparation.
Overlying interconnect layers obscure lower layers and the transistors from optical-localization techniques. They also make it difficult to create electrical contacts using an FIB. Circuits with many interconnect layers are often more easily analyzed from the backside (substrate side) because the silicon substrate is transparent to infrared light. This is the value of sample preparation.
First-silicon issues
In scan-chain failures, critical problems arise with sensitivity to Vdd levels, blocked chains, and hold-time issues. Today's devices utilize scan for production testing to increase fault coverage and reduce test time. When the devices fail, fault coverage falls dramatically and release to production is blocked, even when a device operates perfectly in its target mission mode.
One customer experienced a blocked-chain problem after struggling to determine the root cause. With a blocked chain, a very large number, typically thousands, of scan flip-flops and the logic they cover are inaccessible. Typical test patterns are not able to determine root cause or even localize it because the "window" designed for this is broken. Visibility into the root cause is possible, however, by in essence looking "around" the broken chains using the working chains.
The Ocelot scan-based tester has unique software to analyze blocked chains by using passing chains and the logic that connects all chains together. After new DUT hardware was built, all broken chains on the device were analyzed and the exact locations of the breaks were identified within hours. This process enabled the customer to work with its foundry and resolve the issue, which was determined to be due to the process.
Another customer had a different type of scan failure: sensitivity to high Vdd levels. This is a typical signature for hold-time problems. The solution involved building new hardware to run the device on the Ocelot tester and duplicate what the customer had determined were faulty flip-flops. Simulation checks using spice and static-timing analysis clearly showed hold time on these flops was well within margin. Why were these few flops consistently failing when others with similar or even less margin were passing? Internal-timing measurements were needed.
Since this was a flip-chip, a backside e-beam was selected as the preferred technique. This is a recent evolution of a tool set that has been in use for many years. Traditionally it is used on wire-bond designs where waveforms are measured on metal lines. Using an extension of this on flip-chip packages produces good results. Transistor active areas-gates and drains-are exposed using a flip-chip FIB. No vias are required, which reduces complexity and eliminates any parasitic capacitance.
During the timing measurements it was determined that there was a signal integrity issue: One of the scan-clock branches was not terminated well. A reflection was creating excessive fall time, which was worse at high Vdd.
Excess leakage current is a nonstarter, especially in the consumer market, which demands extremely low-power devices. Go-to-market strategies and timetables may be derailed if devices suffer from "insomnia," which presilicon simulations showed would never happen.
A customer had a 65-nm device with excessive standby current. Upon inspection of the type of package used and the stimulus available, which was an application board, it was determined that a simple and fast approach was to just de-cap the part and look at emission from the topside, the wire-bond side.
That step took a day or so. But there were so many metal layers no emission was visible on the silicon topside. The customer could not go to market with the device as it was, so it was critical to know the root cause. An alternative plan was to repackage the part: Take some bare die and put them in a backside silicon debug-ready package (Figure 2) that exposes the substrate. The process of created a bonding diagram and getting the parts back took about a week.

A
great deal of emission was coming from the backside silicon. We were able to
isolate down to a certain cell in the memory where the customer's problem was
located. This emission was coming from a memory sense amplifier in multiple
instances across the die. This situation correlated very clearly to the setups the
customer was using, so the customer took it from there, working with its IP
supplier to correct the issue.
Another leakage-current project involved an ESD failure on a 40-nm device, which exhibited a very different example of this failure category. The silicon backside was available in the native flip-chip package, so an InGaAs camera along with several iterations of a backside FIB allowed us to see when the leakage created by the ESD event would appear.
The flip-chip part worked well, but the customer got a failure when it performed ESD testing. The customer went with backside FIB alone, but after a few iterations it wasn't clear which device had failed during the ESD event. So we went to the emission system, and even though it was only a milliamp or less it was very obvious and we were able to identify which device was bad. It was very important for the customer to sign off on the mask change, commit to an expensive mask set, and go to full production.
Signal integrity, also known as brownouts or on-chip voltage droop, can be extremely difficult to track down and isolate, especially for DSP circuitry utilizing DLLs or PLLs. A customer had a DLL that was losing lock. That was a fairly simple fix with an FIB edit, and when it was done the customer went through a mask change. Unfortunately, when its customer was given the chip with the FIB repair it discovered a different problem: The device also lost lock. It couldn't solve that problem with FIB, so other remedies were saught.
Our
staff considered e-beam for timing measurements, but that wasn't workable
because the failure was erratic and e-beam requires a consistent pattern or
sequence of stimuli to make a clean measurement. So we settled on mechanical
probing. As with the other jobs, we built hardware to put this part on the
Ocelot scan-based tester and were able to correlate quickly. We then created
mechanical probe points with FIB (Figure
3) and began measuring with very low-loading active probes. The mechanical
probes in this case were 20 femto-Farad loading, extremely low, which meant the
part would still work in normal operation.
We went through eight different probe points to identify what the customer thought was a timing issue. Surprisingly, it turned out to be power-supply droop. When a certain sequence of events occurred, the supply consistently dropped more than a volt-a 3V supply going down to 2V. This dramatic drop turned out to be the root cause of the failure.
This was a very good result for the customer because until then it thought it was going to have to scrap the product that was built with the FIB fix and order another mask change. But because it was a supply droop, the customer was able to fix it with an application note, telling its customer to add more capacitance to the board where it used this device.
Timing is a challenge for customers because even with all of the focus on timing closure prior to tape-out, the impact of variations with actual silicon lots still can't be completely covered with pre-silicon verification tools. Figure 4 lists the industry-standard methods for capturing in-silicon timing. The three primary techniques for timing measurement are as follows:

The advantage of mechanical probing is voltage measurement, which is important for looking at noise and dc levels. It is the only technique for measuring dc levels. But it also is the most challenging because it requires a FIB probe point; a metal line needs to be exposed and a pad built up so that a probe can be landed. That requires a very steady hand because the probe pads are ~10 μm on a side. The result of this handiwork is microvolt resolution. That's the really big plus for mechanical probing.
E-beam
requires less setup. There is some FIB work to expose metal to build up a
smaller pad in some cases. The advantage of e-beam is that it can be used for
very-high-speed measurements, up to 10 GHz. It is always used in a vacuum
chamber, unlike the other techniques, and in some cases requires some
components to be moved or fixtures to be built. Figure 5 shows an in-silicon debug session with the scan-based
tester docked to the e-beam, enabling the device to be controlled by test
vectors with the timing-signal measurements taken during device operation.
The other key advantage to laser timing is that it can do frequency mapping, which is a very effective way to localize signal issues. It can be used on the whole device, or some section of it, and map an image and overlay on it any locations where a spectrum analyzer has detected a particular frequency. Take a scan-chain failure, for example. Scan chains wander throughout the device at a particular frequency. If you set the spectrum analyzer to that frequency, you can see where the activity is, so the places where there is no activity jump out at you. At that point, the system can be used in timing mode to make waveform measurements.
Combining frequency mapping, precision-timing measurements, and the ability to go anywhere on the die without requiring a FIB makes laser timing a simple yet powerful tool. In addition, it is well-suited for 32-nm devices because it has a SIL (solid immersion lens) that touches the silicon with no gap, liquid, or air between the silicon and the lens. The lens itself is made from semiconductor material that provides five times improvement in resolution and signal collection compared with air-gap lenses. Together, that's 25 times improvement in capability over the best air-gap lenses. Figure 6 details a laser-timing lab setup, coordinating GDS navigation, scan-tester docking, and the resulting in-silicon waveform acquisition.
Next-generation first-silicon encounters
The coming generation of 40-nm products will challenge the semiconductor-industry value chain to develop innovative characterization and failure-analysis techniques. On one hand, the foundries will naturally focus on manufacturing excellence with respect to process parameters, transistor performance, and 3-D interconnect methods, leaving the foundries' customers with the tasks of sorting out how best to get the product to market and dealing with the complexities and uncertainties of misbehaving first silicon. Successful silicon companies will recognize that early collaboration with specialized product-engineering service labs will be vital to getting their products to market.
Four of the most common problems with first silicon are scan-chain failures, excess leakage current, signal integrity, and timing. Presto Engineering has developed techniques and procedures and acquired specialized equipment-ATE, laser-scanning microscopes, e-beams, and FIBs (focused ion beams)-to isolate issues and allow its customers to move forward quickly with their product releases.
General process flow
The analytical flow for getting at the root cause of an issue proceeds from fault isolation to localization and, finally, to diagnosis (Figure 1). Fault isolation is primarily a test activity including both functional test results from ATE and testing designed specifically to identify the node or nodes of interest. Localization proceeds from this identification using a variety of techniques to determine the physical location of the fault. Diagnosis then attempts to relate the fault to some aspect of the circuit design or process performance.
Any failure-analysis technique requires either decapping the sample, if it is a wire-bond package, or sample preparation for a flip-chip/CSP. Sample preparation takes a couple of hours, including thinning the sample to 50 to 200 microns. It's similar in complexity and risk to decapping a part. You can almost count on all of the units working without any damage after that preparation.
Overlying interconnect layers obscure lower layers and the transistors from optical-localization techniques. They also make it difficult to create electrical contacts using an FIB. Circuits with many interconnect layers are often more easily analyzed from the backside (substrate side) because the silicon substrate is transparent to infrared light. This is the value of sample preparation.
First-silicon issues
In scan-chain failures, critical problems arise with sensitivity to Vdd levels, blocked chains, and hold-time issues. Today's devices utilize scan for production testing to increase fault coverage and reduce test time. When the devices fail, fault coverage falls dramatically and release to production is blocked, even when a device operates perfectly in its target mission mode.
One customer experienced a blocked-chain problem after struggling to determine the root cause. With a blocked chain, a very large number, typically thousands, of scan flip-flops and the logic they cover are inaccessible. Typical test patterns are not able to determine root cause or even localize it because the "window" designed for this is broken. Visibility into the root cause is possible, however, by in essence looking "around" the broken chains using the working chains.
The Ocelot scan-based tester has unique software to analyze blocked chains by using passing chains and the logic that connects all chains together. After new DUT hardware was built, all broken chains on the device were analyzed and the exact locations of the breaks were identified within hours. This process enabled the customer to work with its foundry and resolve the issue, which was determined to be due to the process.
Another customer had a different type of scan failure: sensitivity to high Vdd levels. This is a typical signature for hold-time problems. The solution involved building new hardware to run the device on the Ocelot tester and duplicate what the customer had determined were faulty flip-flops. Simulation checks using spice and static-timing analysis clearly showed hold time on these flops was well within margin. Why were these few flops consistently failing when others with similar or even less margin were passing? Internal-timing measurements were needed.
Since this was a flip-chip, a backside e-beam was selected as the preferred technique. This is a recent evolution of a tool set that has been in use for many years. Traditionally it is used on wire-bond designs where waveforms are measured on metal lines. Using an extension of this on flip-chip packages produces good results. Transistor active areas-gates and drains-are exposed using a flip-chip FIB. No vias are required, which reduces complexity and eliminates any parasitic capacitance.
During the timing measurements it was determined that there was a signal integrity issue: One of the scan-clock branches was not terminated well. A reflection was creating excessive fall time, which was worse at high Vdd.
Excess leakage current is a nonstarter, especially in the consumer market, which demands extremely low-power devices. Go-to-market strategies and timetables may be derailed if devices suffer from "insomnia," which presilicon simulations showed would never happen.
A customer had a 65-nm device with excessive standby current. Upon inspection of the type of package used and the stimulus available, which was an application board, it was determined that a simple and fast approach was to just de-cap the part and look at emission from the topside, the wire-bond side.
That step took a day or so. But there were so many metal layers no emission was visible on the silicon topside. The customer could not go to market with the device as it was, so it was critical to know the root cause. An alternative plan was to repackage the part: Take some bare die and put them in a backside silicon debug-ready package (Figure 2) that exposes the substrate. The process of created a bonding diagram and getting the parts back took about a week.
Another leakage-current project involved an ESD failure on a 40-nm device, which exhibited a very different example of this failure category. The silicon backside was available in the native flip-chip package, so an InGaAs camera along with several iterations of a backside FIB allowed us to see when the leakage created by the ESD event would appear.
The flip-chip part worked well, but the customer got a failure when it performed ESD testing. The customer went with backside FIB alone, but after a few iterations it wasn't clear which device had failed during the ESD event. So we went to the emission system, and even though it was only a milliamp or less it was very obvious and we were able to identify which device was bad. It was very important for the customer to sign off on the mask change, commit to an expensive mask set, and go to full production.
Signal integrity, also known as brownouts or on-chip voltage droop, can be extremely difficult to track down and isolate, especially for DSP circuitry utilizing DLLs or PLLs. A customer had a DLL that was losing lock. That was a fairly simple fix with an FIB edit, and when it was done the customer went through a mask change. Unfortunately, when its customer was given the chip with the FIB repair it discovered a different problem: The device also lost lock. It couldn't solve that problem with FIB, so other remedies were saught.
We went through eight different probe points to identify what the customer thought was a timing issue. Surprisingly, it turned out to be power-supply droop. When a certain sequence of events occurred, the supply consistently dropped more than a volt-a 3V supply going down to 2V. This dramatic drop turned out to be the root cause of the failure.
This was a very good result for the customer because until then it thought it was going to have to scrap the product that was built with the FIB fix and order another mask change. But because it was a supply droop, the customer was able to fix it with an application note, telling its customer to add more capacitance to the board where it used this device.
Timing is a challenge for customers because even with all of the focus on timing closure prior to tape-out, the impact of variations with actual silicon lots still can't be completely covered with pre-silicon verification tools. Figure 4 lists the industry-standard methods for capturing in-silicon timing. The three primary techniques for timing measurement are as follows:
- Tried-and-true mechanical probing (still the best technique in certain situations, as detailed below)
- Electronic beam probing, which has been used for nearly 20 years
- Laser-timing through the backside of silicon, which has become available in the past two years
The advantage of mechanical probing is voltage measurement, which is important for looking at noise and dc levels. It is the only technique for measuring dc levels. But it also is the most challenging because it requires a FIB probe point; a metal line needs to be exposed and a pad built up so that a probe can be landed. That requires a very steady hand because the probe pads are ~10 μm on a side. The result of this handiwork is microvolt resolution. That's the really big plus for mechanical probing.
Laser timing
is much easier to use than either mechanical probing or e-beam. It measures
through the backside of silicon using optical techniques-the laser part of the
timing probe. It provides excellent time resolution: 10 GHz today, 20 GHz is
coming soon, and 40 GHz is in sight.
The other key advantage to laser timing is that it can do frequency mapping, which is a very effective way to localize signal issues. It can be used on the whole device, or some section of it, and map an image and overlay on it any locations where a spectrum analyzer has detected a particular frequency. Take a scan-chain failure, for example. Scan chains wander throughout the device at a particular frequency. If you set the spectrum analyzer to that frequency, you can see where the activity is, so the places where there is no activity jump out at you. At that point, the system can be used in timing mode to make waveform measurements.
Combining frequency mapping, precision-timing measurements, and the ability to go anywhere on the die without requiring a FIB makes laser timing a simple yet powerful tool. In addition, it is well-suited for 32-nm devices because it has a SIL (solid immersion lens) that touches the silicon with no gap, liquid, or air between the silicon and the lens. The lens itself is made from semiconductor material that provides five times improvement in resolution and signal collection compared with air-gap lenses. Together, that's 25 times improvement in capability over the best air-gap lenses. Figure 6 details a laser-timing lab setup, coordinating GDS navigation, scan-tester docking, and the resulting in-silicon waveform acquisition.
The coming generation of 40-nm products will challenge the semiconductor-industry value chain to develop innovative characterization and failure-analysis techniques. On one hand, the foundries will naturally focus on manufacturing excellence with respect to process parameters, transistor performance, and 3-D interconnect methods, leaving the foundries' customers with the tasks of sorting out how best to get the product to market and dealing with the complexities and uncertainties of misbehaving first silicon. Successful silicon companies will recognize that early collaboration with specialized product-engineering service labs will be vital to getting their products to market.
Design for manufacturing and yield
Decode a quadrature encoder in software
Floorplanning: concept, challenges, and closure
Microcontroller drives piezoelectric buzzer at high voltage
Converter translates Bayer raw data to RGB format
ARM versus Intel: a successful stratagem for RISC or grist for CISC's tricks?
Is high-level synthesis ready for prime time?
Currently no items
Datasheets.com Parts Search
185 million searchable parts
(please enter a part number or hit search to begin)
KNOWLEDGE CENTER
