Playing it cool
The semiconductor industry has long relied on scan ATPG (automatic test pattern generation) tools instead of functional test to create stimulus-response patterns with very high fault coverage. But ATPG patterns are designed to converge to high fault coverage in as few patterns as possible, making them comparatively power-hungry: Dynamic power consumption is much higher during scan testing than during normal operation. The higher-than-normal power consumption can exceed the power rating of devices and damage them during production testing or can cause false failures that require significant time and effort to diagnose. To compensate, many designers are now turning to power-aware ATPG technology to manage power during test.
When a logic state transition occurs in a device, numerous parasitic capacitors charge and discharge. The more state transitions that occur during a small instant of time—such as the portion of a clock cycle immediately following the rising (or falling) edge of the system clock—the higher the capacitive switching and the larger the transient currents. These instantaneous switching currents contribute to voltage drops along power rails that can add undesirable circuit delays.
Proper design of the power rails ensures that IR-drop delays arising from switching currents are within the allowable range during normal device operation. Scan ATPG patterns, however, can increase the magnitude of switching currents in a device by up to 10 times that of mission-mode patterns.
ATPG begins by targeting a primary fault with stimulus-pattern care bits that set up the conditions to sensitize and propagate the fault to a scan flop or primary output. The pattern generator then targets additional faults, called secondary faults, by assigning more care bits to the same stimulus pattern. Eventually, the pattern generator stops targeting secondary faults and just assigns random values to the remaining bits in the pattern to detect additional “bonus” faults not explicitly targeted by the care bits.
Both care bits and random-fill bits in each scan pattern create a large number of logic state transitions that lead to an increase in the magnitude of instantaneous switching currents in a device relative to levels that occur when the circuit operates under normal conditions. But the effects of this increase in magnitude vary depending on whether the device is in scan-shift mode or capture mode; the switching currents affect the dynamic power consumption, which is the current-voltage product measured over time.
Dynamic power consumption averaged over a large number of clock cycles (such as the hundreds or thousands of cycles needed to scan a single stimulus pattern into a design while scanning out the response to the previous pattern) can lead to excessive average power, which in turn can lead to thermal problems such as hot spots on the die that can damage the device.
Power rail collapse is another shift-mode problem related to high dynamic power consumption. If transient currents are excessive during scan shifting, bits shifted into a circuit along a scan chain will be dropped, resulting in pattern mismatches on the tester. In power rail droop, which is less severe, the IR-drop delays prevent scan data from propagating to the next stage in the scan chain at the target scan-shift frequency, also resulting in test program failure.
Both excessive average power and power rail droop during scan shifting can be addressed by lowering the shift frequency sufficiently, in the latter case to allow enough time for scan signals to meet the shift cycle timing under corner conditions. The downside of reducing scan shift frequency is that it increases the time spent testing each device.
A better approach to these problems is to reduce the number of state transitions during shift. One case study of IR-drop behavior in fabricated devices showed that reducing flop switching activity during scan testing is an effective way to avoid power-related failures (Ref. 1). Methods that reduce flop switching during shift, such as the adjacent fill technique, take advantage of the fact that typically less than 10% of bits in a scan pattern are actually used to sensitize and propagate fault effects.
Figure 1. Adjacent fill replicates care bits to their nearest neighbors in the scan chain.
The adjacent fill technique, instead of random-filling the remaining bits, replicates the value of each care bit to succeeding bits in the scan chain up to the next care bit of opposite value (Figure 1). The replication of care-bit values reduces by 85% to 98% (or more) the total flop-state transitions during scan-in sequences across an entire pattern set (Ref. 2). After capture, when data is scanned out, the flop switching activity is still significantly lower than random fill, resulting in a total combined average power reduction that is typically in the range of 50%.
The most subtle and intractable power problem occurs during scan capture. Although the phenomenon is associated with both stuck-at and transition-delay ATPG patterns, it is more common in delay-sensitive at-speed testing.
During scan testing, after a pattern has been shifted in, the test clock is pulsed while the scan enable is deactivated (depending on the ATPG technique, the capture mode may include a launch sequence followed by one or more capture sequences). Excessive flop switching during the capture mode can result in enough IR-drop delay that logic values fail to transition within the capture window, causing an otherwise “good” device to produce incorrect responses. There are few easy workarounds to resolve this problem at the tester, and tracing the source of the false-positive device failures is not a trivial task.
Figure 2. This typical profile of switching activity in capture mode illustrates two characteristics: First, although the switching peaks are higher near the beginning of the pattern set when patterns have more care bits, there is no letup in the switching “spikes” even as the fault coverage converges to its maximum. Second, adjacent fill, while effective at lowering switching activity during shift, does not limit it during capture.
To illustrate the type of flop switching activity that occurs during capture mode, Figure 2 displays results from an ATPG run on a relatively small industrial design with a single clock and adjacent fill enabled during scan shifting. The graph plots both fault coverage and flop switching activity during capture (as a percentage of total flops in the design) versus pattern count, based on the power-analysis summary report produced by Synopsys’ TetraMAX ATPG product (Ref. 3).
There are two characteristics worth noting. First, although the switching peaks are higher near the beginning of the pattern set when patterns have more care bits, there is no letup in the switching “spikes” even as the fault coverage converges to its maximum. Second, adjacent fill, while effective at lowering switching activity during shift, does not limit it during capture.
In fact, high flop switching activity can occur whenever a large number of logic states change simultaneously. Regardless of whether there are few or many undetected faults remaining—or whether random or low-power fill techniques are used during shifting—the transition from scan mode to capture mode results in the most state transitions during ATPG testing. This is because the scanned-in state is almost completely unconstrained, whereas the first clock pulse in capture mode results in a state that is highly constrained by the circuit’s state machines. Subsequent capture clock pulses, if any, almost always result in fewer state changes.
The best approach to alleviating this peak power problem is to limit the number of state transitions that occur when first entering capture mode and to use low-power fill techniques to reduce switching activity during shift mode. The challenge for test architects is to accomplish this with minimal impact on DFT (design-for-test) logic, ATPG run time, and pattern count.
To avoid an explosion in the number of test patterns and a substantial increase in ATPG run time, you do not need to minimize switching activity during capture—in fact, it’s essential that you not try to do this. Instead, your goal should be only to reduce peak switching to levels that are commensurate with switching rates observed when a device operates in mission-mode. By doing this, you will suffer no unnecessary yield loss when applying the power-aware ATPG tests even under corner conditions. If the designer can determine the peak flop switching activity of a design based on simulations of mission-mode patterns, the ATPG tool can then apply constraints so the peak switching during capture mode doesn’t exceed this switching budget.
Figure 3. This design includes six internal clocks: φ0 and φ1 directly fan out to scan flops, while φ2 through φ5 are each driven by a clock-gated latch, referred to in this context as a CGC (clock gating cell).
Consider the example in Figure 3 depicting a small design with multiple internally gated clocks that fan out to the indicated number of scan flops. The clocks CLK1 and CLK2 can either be external or PLL-derived. There are six internal clocks: φ0 and φ1 directly fan out to scan flops, while φ2 through φ5 are each driven by a clock-gated latch, referred to in this context as a CGC (clock gating cell). The control logic enabling the CGCs is not shown, but it is assumed that clocks φ2 and φ3 can be independently enabled and controlled by ATPG via scan-chain care bits. The two CGCs at the bottom indicate their control logic is constrained so they’re mutually dependent: Whenever φ4 and φ5 are activated by ATPG, they’re activated simultaneously. In addition, since CLK2 feeds all the internally gated clocks, they are each dependent on φ1 (though not mutually dependent).
Because of these clock dependencies, it’s necessary to distinguish between the internal clocks and their corresponding clock domains, listed in the second column of the table in Figure 3. Each of these five primary clock domains can be considered independently controllable by ATPG, and each fans out to its own unique bank of scan flops with fan-out calculated in the third column. Each entry represents the maximum switching level for that primary domain since the number of scan flops that can change state is less than or equal to the fan-out.
While it’s essential to activate all five primary clock domains to achieve maximum coverage, ATPG needs to activate only one at a time to target (and detect) any given primary fault in the design (an exception to this observation is described under “DFT requirements and exception-handling”). Activating more than one primary domain simultaneously to target more secondary faults with the same pattern is purely discretionary. The table in Figure 3 shows several domain combinations, referred to as discretionary domains. For example, φ123 represents the combination in which primary domains φ12 and φ13, corresponding to internal clocks φ1, φ2, and φ3, are activated at the same time. Nine additional discretionary clock domains (not shown) have fan-outs ranging from 650 to 1250.
Without a peak switching budget, ATPG is free to target as many secondary faults per pattern as possible by activating any combination of domains. This is the way ATPG normally behaves so it can reach the maximum fault coverage with as few patterns as possible. But with a switching budget of 600 flops, for example, the combinations are limited: Only the three discretionary clock domains shown in the table will be activated in addition to the primary domains.
ATPG assumes there are enough independently controllable primary clock domains to meet the designer’s switching budget. In general, ATPG will meet the power budget if
(n/F) ≤ B
where B is the designer’s switching budget expressed as a percentage of the total number of scan flops F in the design, and n is the number of scan flops in the primary clock domain that has the highest fan-out.
The fraction n/F is the theoretical maximum ATPG switching activity rate, CMAX. The design in Figure 3, for example, has a maximum fan-out of n = 400 flops (primary domains φ0 and φ145). The theoretical maximum CMAX = n/F = 400/1250 = 32%, so if B ≥ 32%, the power budget will certainly be met.
Figure 4. In these profiles of flop switching activity during the capture mode for an industrial design with many internally gated clocks, the red data is associated with standard transition delay patterns, whereas the blue data reflects power-aware transition delay patterns.
Figure 4 displays two profiles of flop switching activity during capture mode for an industrial design with many internally gated clocks. The red data is associated with standard transition delay patterns, whereas the blue data reflects TetraMAX power-aware transition delay patterns. The design comprises approximately F = 262,700 scan flops, and the clock domain with the highest fan-out has n = 14,545 scan flops. If the design’s mission-mode patterns reach a peak switching level of 6%, then standard ATPG generates 1145 patterns that switch higher than this level. So, the device is at risk for incurring additional IR-drop delays that could cause scan test pattern mismatches on the tester.
In contrast, with a designer-specified flop switching budget of B = 6%, power-aware ATPG generates patterns that all switch below this level, thereby avoiding unnecessary yield loss. You would expect to meet the switching budget since, from equation (1)
CMAX = 14,545/262,700 = 5.5% & 6%
Actual peak switching
It turns out that the actual ATPG peak switching rate CPEAK for the industrial design was 5.3% with no loss in test coverage. CPEAK can be less than the theoretical maximum because only a fraction α of the maximum flop fan-out n may ultimately change state. This means that ATPG could still meet a budget even if it’s less than n/F. The fraction α depends on the design and the type of fault model used, and it tends to be lower for transition delay tests than for stuck-at tests.
Figure 5. The likelihood that ATPG will exceed the switching budget diminishes with an increase in the number of clock domains.
Assuming a switching budget reflects the actual peak switching level of mission-mode patterns, is it possible that ATPG’s peak switching rate CPEAK could exceed this budget? The answer depends on the number of independently controllable primary clock domains in the design. In the example illustrated in Figure 5, assume that α = 90% and the probability that CPEAK > B is 60% in a single-domain scenario (as presented in Figure 2). As the number of domains increases, peak switching CPEAK = αn/F decreases, as does the probability that it could exceed the switching budget.
In other words, it is highly unlikely that a device having a relatively large number of primary clock domains will have peak switching levels that exceed the peak level that can be achieved by ATPG in full control of activating all these domains. Furthermore, under these conditions, there are enough discretionary domains such that CPEAK approaches the theoretical switching maximum n/F. That is, CPEAK – CMAX →0.
DFT requirements and exception-handling
The advantage of a power-aware ATPG methodology based on control of internally gated clocks is that no special DFT logic needs to be added to a design and no special DFT flows are required. Unlike ad hoc methods that require partitioning a design into separate blocks with different scan enables, control of peak power consumption is fully automated within ATPG.
Still, there are certain classes of faults that can’t be detected exclusively by clock gating, and these require special treatment by ATPG to comply with switching requirements. For example, some faults can only be detected by asserting an asynchronous global reset. For these, ATPG must assign the reset value to enough flops to ensure that when the reset is asserted in the pattern, switching doesn’t exceed the budget. Power-aware ATPG technology is rapidly evolving to accommodate a range of “exceptions” like this that are, in fact, routinely encountered in the design of today’s complex systems-on-a-chip.