Product How To: DFT strategy for ARM processor-based designs
This is an important consideration at Samsung Electronics, which designs a variety of SoCs containing ARM multicore processors. Simply adding more chip-level pins for testing conflicts with packaging constraints and can potentially undermine other cost-saving techniques that rely on utilizing fewer pins (see sidebar on multisite testing). What is needed instead is a DFT strategy optimized for designs that use multicore processors—a strategy in which the architecture and automation elements work in tandem to lower test cost without compromising test quality or significantly increasing automatic test pattern generation (ATPG) runtime.
This article provides an example of an optimized DFT architecture, referred to as “shared I/O.” It is enabled by Synopsys’ synthesis-based test solution, which has been used successfully in Samsung’s multicore processor designs. The experience demonstrates that shared I/O is a better approach than the standard DFT architecture for testing multicore designs since it reduces test costs by utilizing fewer pins while providing the same or better test time reduction.
Dedicated test pins
Figure 1 shows a standard DFT architecture for a design with quad-core processors and user-defined logic at the top level. The cores need not be identical but all blocks contain an embedded scan compressor-decompressor (CODEC) to reduce the number of tester cycles required for achieving high test coverage, which in turn reduces test cost [1, 2].
The amount of compression implemented for a particular CODEC determines the number and length of its scan chains, and is chosen to ensure an approximately uniform scan chain length L across all the digital logic in the design. While more compression shortens the chain length and achieves greater test time reduction, the amount of compression applied is constrained in practice by a minimum number of scan inputs to the CODEC as well as routing considerations. Even so, as the number of cores increases, it becomes essential to keep the number of scan I/O needed to test each CODEC reasonably small to avoid exceeding the number of chip-level pins available for testing.
Shared test pins
The optimized DFT architecture is illustrated in Figure 2. The chip-level test pins are uniformly connected to all the CODECs in the design, and an integration block added to the CODEC outputs enables full observability of the scan chains. Assuming a design contains N CODECs, then log2(N) test input pins are allocated to select which logic is being observed on the output pins.
Synopsys’ DFTMAX™ compression was used to implement both DFT architectures for a 20-nm design containing four identical ARM processor cores, one of the latest versions of the Cortex-A series, plus user-defined logic. Only a few modifications to the original DFT scripts were required to implement the optimized architecture. Table 1 compares test pin count and normalized TetraMAX® ATPG stuck-at pattern count results for the standard versus shared I/O architectures using equivalent chain length and high fault coverage for both scenarios:
Despite consuming half the test pin resources, shared I/O required substantially fewer patterns (and tester cycles)—30% fewer for the power-aware patterns, which are generated to avoid false failures during production testing . The decrease in pattern count can be explained in part by an increase in ATPG efficiency that comes “for free” when the scan inputs are shared among multiple cores. However, in this application, designers also used additional tools in the Synopsys product, enabled when processor cores are identical, that enhance both pattern efficiency and the ability to isolate defective parts.