Product How To: DFT strategy for ARM processor-based designs

& -January 22, 2013

One of the most significant design trends of the decade is the widespread use of ARM® multicore processors in systems-on-chip (SoCs). Designers’ ability to easily and cost-effectively employ multiple, high-performance embedded processors as needed to meet the computational requirements of the end application has helped fuel the explosive growth in mobile computing, networking infrastructure, and digital home entertainment systems. But from the design-for-test (DFT) perspective, is there a strategy for easily and cost-effectively testing multicore designs? A key challenge is already emerging: as the number of processor cores increases, it has become increasingly difficult to maintain high test quality without a requisite increase in cost stemming from the need to allocate substantially more pins for digital test.

This is an important consideration at Samsung Electronics, which designs a variety of SoCs containing ARM multicore processors. Simply adding more chip-level pins for testing conflicts with packaging constraints and can potentially undermine other cost-saving techniques that rely on utilizing fewer pins (see sidebar on multisite testing). What is needed instead is a DFT strategy optimized for designs that use multicore processors—a strategy in which the architecture and automation elements work in tandem to lower test cost without compromising test quality or significantly increasing automatic test pattern generation (ATPG) runtime.

This article provides an example of an optimized DFT architecture, referred to as “shared I/O.” It is enabled by Synopsys’ synthesis-based test solution, which has been used successfully in Samsung’s multicore processor designs. The experience demonstrates that shared I/O is a better approach than the standard DFT architecture for testing multicore designs since it reduces test costs by utilizing fewer pins while providing the same or better test time reduction.

Dedicated test pins
Figure 1 shows a standard DFT architecture for a design with quad-core processors and user-defined logic at the top level. The cores need not be identical but all blocks contain an embedded scan compressor-decompressor (CODEC) to reduce the number of tester cycles required for achieving high test coverage, which in turn reduces test cost [1, 2].

Figure 1. The standard DFT architecture for a quad-core design with CODECs embedded in each core and at the top level. Because each CODEC uses its own dedicated pins, the number of pins needed to test the design grows large as more processor cores are added.

The amount of compression implemented for a particular CODEC determines the number and length of its scan chains, and is chosen to ensure an approximately uniform scan chain length L across all the digital logic in the design. While more compression shortens the chain length and achieves greater test time reduction, the amount of compression applied is constrained in practice by a minimum number of scan inputs to the CODEC as well as routing considerations. Even so, as the number of cores increases, it becomes essential to keep the number of scan I/O needed to test each CODEC reasonably small to avoid exceeding the number of chip-level pins available for testing.

Shared test pins
The optimized DFT architecture is illustrated in Figure 2. The chip-level test pins are uniformly connected to all the CODECs in the design, and an integration block added to the CODEC outputs enables full observability of the scan chains. Assuming a design contains N CODECs, then log2(N) test input pins are allocated to select which logic is being observed on the output pins.

Figure 2. The optimized architecture shares uniformly-connected test input pins and uses integration logic to observe the CODEC outputs. The pin count increases by just log2(N) with the number of processor cores, N.

Synopsys’ DFTMAX™ compression was used to implement both DFT architectures for a 20-nm design containing four identical ARM processor cores, one of the latest versions of the Cortex-A series, plus user-defined logic. Only a few modifications to the original DFT scripts were required to implement the optimized architecture. Table 1 compares test pin count and normalized TetraMAX® ATPG stuck-at pattern count results for the standard versus shared I/O architectures using equivalent chain length and high fault coverage for both scenarios:

Table 1. Shared I/O results in fewer ATPG patterns than the standard architecture even when only half as many pins are used.

Despite consuming half the test pin resources, shared I/O required substantially fewer patterns (and tester cycles)—30% fewer for the power-aware patterns, which are generated to avoid false failures during production testing [3]. The decrease in pattern count can be explained in part by an increase in ATPG efficiency that comes “for free” when the scan inputs are shared among multiple cores. However, in this application, designers also used additional tools in the Synopsys product, enabled when processor cores are identical, that enhance both pattern efficiency and the ability to isolate defective parts.

Loading comments...

Write a Comment

To comment please Log In