Design Feature: February 17, 1994
Timing is critical for a digital system's success. For decades, designers have analyzed circuits' timing using conservative worst-case methods. Such methods ensure reliable systems, but many are overly conservative. They also provide little insight into the risk of retaining a signal path that analysis shows exceeds its timing requirements. Statistical timing analysis can quantify the risk associated with that signal path.
Formally stated, a signal path is "a series of interconnected nodes in a digital design that are defined to begin and end with a constrained device or pin" (Ref 1). Most digital designs have thousands of paths; each path has some inherent delay requirement. These requirements derive from system performance goals, circuit functionality, system clock rates, interface timing constraints, and other considerations.
You can identify signal paths at any number of conceptual levels. A system-level path, for example, comprises a group of similar individual paths. All signals in the group have the same basic structure and requirements. For the system to operate properly, the signals must function within their requirements.
Once you identify a signal path and its requirements, timing analysis can determine if the path will meet its requirement. As with the identification process, you can perform timing analysis at any of several levels. You can scrutinize individual paths or analyze and take one or more paths from a system-level group to represent all of the paths in the group. Likewise, a number of analysis methods are available, depending on information availability and desired accuracy.
You can make a good first-order timing estimate by adding the maximum delay values for each component in the path, which is called worst-case analysis. Most vendors provide data books that specify the maximum output propagation delays and input setup times for their devices. If the results of a worst-case analysis are within 20% of exceeding the path's timing requirement, that path is said to be critical and is usually subjected to more detailed analysis. If a path meets its requirements by 50% or more, it has a "safe margin," and no further analysis is necessary. Paths that fall somewhere between safe and critical may be treated either way but are usually reexamined along with the critical paths.
The worst-case-analysis method is simple to use, and often electrical CAE tools support it, allowing a majority of the signal paths in a design to be subjected to a first-order check. Passing this test does not guarantee, however, that a path will function properly. The estimate does not account for factors such as interchip wiring delays or driver loading effects. Generally, these effects are small, but for critical paths, they may be fatal. For this reason, designers use more rigorous worst-case methods that account for loading effects. Such methods first estimate the physical properties of the interconnections (length, capacitance, and inductance), then adjust the timing delays through each component or wiring segment in the path. You can perform the analysis manually using loading information from vendor data books or with the aid of a computer-based circuit-analysis tool.
Early in the design process, usually you do not have enough information to make an accurate estimate of the loading delays for each path. In such cases, the designer must make worst-case assumptions about the characteristics of a "typical node" (eg, all nodes have 35-pF load capacitance). Generally, designers also set a design margin that applies equally to a path in the design (eg, all paths must meet requirements by 10 nsec or more to allow for interchip delays).
For an example of a ROM-circuit analysis using the worst-case approach, see box, "A worst-case circuit analysis." In the example, the analysis predicts a delay of 122.2 nsec vs a requirement of 121.2 nsec. If, as in the example, the results of the worst-case analysis indicate that the circuit will not operate reliably over the specified range of operating conditions, conservative design rules dictate that you replace one or more of the components with a faster device, change the design, or relax performance goals.
Individual paths include many permutations of the same basic structure, which is a single control output propagating from the MMU clock input through the ROM and data buffers and into the CPU. In this case, 16 address bits and one chip-enable line traverse through three ROMs with eight data lines for (16+1)×3×8=408 individual paths. A single system-level path from MMU clock to CPU data input represents all of the individual paths.
Table 1 shows typical requirements for the microprocessor circuit in Fig A. To meet the 4-MIPS requirement at the 33-MHz clock rate, a typical instruction can require no more than 33/4=8 clock cycles. Instruction execution requires four clocks, so the ROMs must be accessed within four clock periods. Therefore, each of the 408 individual paths from the MMU clock input to the CPU data inputs must propagate in <4/33.0E+06=121.2 nsec.
Figure worst-case delays
In this example, the first-order worst-case analysis is straightforward. We have chosen to implement MMU functions using a field-programmable-gate-array (FPGA) device. Using the postlayout-timing-analysis tools the vendor provided, we find that the expected worst-case address delay is 12 nsec. The chip-select outputs are faster (8 nsec). According to the data book, the chip select to data delay of a ROM (50 nsec) is also faster than the address delay (90 nsec). Because the chip-select paths will always be faster than the address paths, we simplify the analysis by ignoring chip selects and concentrating on the worst-case address path.
The data buffer has a 7-nsec delay from input to output, and the CPU requires 5 nsec of setup time on data inputs. Therefore, the total worst-case delay is 12+90+7+5=114 nsec. Because this is within 20% of the requirement, (121.2-114)/ 121.2= 95%, consider the path critical. Thus, we need to examine wiring delays and loading effects.
In the case of our ROM circuit, the printed-wiring-board (PWB) vendor advertises typical capacitance of 4 pF/in. and inductance of 0.2 mH/in. of etch. An initial board layout suggests that each trace in the path has approximately 18 in. of etch. Computer models are not available for the CPU or FPGA pins, so you will have to estimate these by hand. The ROM manufacturer provided a Spice analysis of the ROM drivers and the buffer circuit based on our anticipated loading. We estimate the combined worst-case delay through this portion of the path to be 102.3 nsec, compared with 97 nsec using the first-order
estimate.
For the MMU FPGA, the only loading information in the vendor's data book is a chart of delays vs capacitive loads. Inductive effects on these delays are small compared with capacitive loading, so, initially, we can ignore the inductive contribution and account only for capacitive effects. Calculate the load capacitance for a typical address line by adding the input capacitance of the three ROMs (12 pF×3=36 pF) and the anticipated capacitance of the signal trace (18 in.×4 pF/in.=72 pF) for a total of 108 pF.
The vendor-specified maximum delay of the output driver assumed a 50-pF load. The vendor's loading chart recommends adding to this maximum 0.05 nsec/pF over the rated load. The adjustment to the address delay through the MMU becomes 0.05×(108-50)=2.9 nsec. Therefore, total address delay through the MMU is 12+2.9=14.9 nsec. Because loading doesn't affect input setup delays, they remain at 5 nsec. Thus, total estimated delay for the ROM path is 14.9+102.3+5=122.2 nsec. The path fails to meet its requirement.
This conservative approach can be prohibitive, especially for high-performance systems: System-level requirements usually restrict architectural changes, reducing performance goals is self-defeating, and in many cases, the fastest available parts have already been specified. When you can find faster parts, they are often expensive and available only in limited quantity, which limits the design's producibility.
At this point in the design process, most designers begin to rationalize, thinking it's not likely that all of the components in the path will perform at worst case, or they assume that vendors always pad the worst case specifications. Thus, a design team concludes that a marginal design will work fine in the real world. But how well the design will work is a question often left unanswered. The design team accepts the unknown risk and resolves to work it out on the prototype.
However, it is both possible and practical to quantify risks associated with a critical path. Statistical analysis offers detailed insight into path delays as well as a means for determining how well the design will work. The analysis methods use characterization data for each devicerather than maximum or minimum parametersto derive a function that describes the probable delay for each device. Combining delay functions forms a joint probability distribution for the path's overall delay, which you then use to compute the probability that the path will either pass or fail its requirements. If all paths in the design receive a statistical analysis, you can estimate the design's expected manufacturing yield with respect to timing.
Few vendors actually publish characterization data for their products, yet this information is generally available through the vendor's technical-support staff. Most vendors collect product-characterization data to generate timing specifications for their parts and to provide metrics for monitoring their manufacturing processes. For military products, vendors must collect delay data to meet Mil-Std-883 requirements. If characterization data is not available from the vendor, however, you can collect it by measuring the delays through a large sample of devices (100 or more) over the design's desired range of temperatures and operating voltages. Note that such testing is usually expensive, though.
Using characterization data can be risky. Usually, characterization data is short-term datafor example, samples come from a single batch (lot) of devices produced over a short period of time. Short-term data does not account for the possibility that the manufacturer's processes may change over time or that multiple vendors (or multiple foundries owned by the same manufacturer) will produce the same end product with varying results. Also, characterization data can be wildly different from one manufacturer to another; and, currently, no standards exist for collecting this data.
The delay distribution function represents the probability that a randomly selected device will have a given delay under fixed operating conditions. In general, the distribution approximates a Gaussian, or normal, curve. The important statistical properties of the distribution are its mean (µ) and variance (s2). This information is directly available from statistical characterization data. For tabular data, the m and s2 values must be calculated using Eq 1 and Eq 2 (Ref 2), where X is the series of delay measurements under the desired operating conditions, and N is the number of measurements.
Equation 1
Note that temperature or voltage variations may affect both the mean and variance values in a nonlinear fashion. Designers, therefore, must recalculate µ and s2 for each desired set of operating conditions rather than simply scaling. Also, it is not generally valid to assume that a typical value in the vendor's data book represents the process mean or that the specified maximum values represent any specific variance from the process mean. Rather than making assumptions, designers should contact the vendor to obtain characterization data.
Most vendor-supplied characterization data is short-term data and does not account for the possibility that the vendor's manufacturing processes may change over time or that multiple factories or vendors may produce the product with varying results. Traditionally, vendors have accounted for such variations by specifying a guard band or design margin. Designers typically also include a design margin in their worst-case timing analysis by limiting the acceptable delay to some fixed value below the actual requirement. The trouble with these approaches is that they are arbitrary; it is virtually impossible to quantify risks associated with these methods.
Statistical analysis provides a more concise approach for handling short-term data; Motorola's Six Sigma approach offers two options: You either set the path goals such that the requirement is at least six standard deviations (6s) greater than the mean delay for the path, or you account for possible process shifts by shifting the mean delay for each device by a factor of 1.5s and then set the path goals such that no more than three defects are expected per million implementations (3 dpm) (Ref 3).
Some vendors track their processes and collect long-term characterization data. This data includes a large sample of devices from many manufacturing lots, and it accounts for normal process shifts by sampling the process over time. Depending on the vendor's record for controlling its processes, long-term data may provide more insight into the vendor's process shift than simply scaling short-term data. You can use long-term data directly, or, for a more conservative analysis, you can similarly scale it to short-term data.
Some devices don't lend themselves to statistical modeling. Memory devices, for example, are produced in the same manner as most other electronic components but differ in how they are tested and sold. Most memory products are binned before shipment, meaning they are individually tested at the factory and sorted into groups that have similar performance characteristics. For example: A vendor manufactures ROMs that have a nominal access time of 100 nsec, and the variance for the process is 10 nsec. The vendor tests each part and sorts them into bins of <90 nsec, 90 to 110 nsec, and 110 to 130 nsec. Finally, the manufacturer marks each group of parts with an appropriate speed grade and sells the lots at different prices, based on market demand.
Field-programmable devices and ASICs offer another twist. Vendors cannot gather characterization information for a device that doesn't yet exist. However, the vendor can characterize the manufacturing process and provide timing analysis based on that characterization, estimating the delay distribution using a static timing analyzer. Most static timing analyzers can be set to determine "best/strong," "nominal," and "worst/weak" process points at minimum, nominal, and maximum temperature and supply voltage. By analyzing a path through the design at each of the process pointsand holding the temperature at maximum and the supply voltage at minimumdesigners can derive the worst-case delay distribution for the path.
As with other components, the delay distribution for an ASIC will be a Gaussian curve (unless the device is screened or speed-graded). The mean delay for the device is the result of the "nominal-process" timing analysis, and s is given by
Equation 3
where W is the result of the "worst-process" analysis, B is the result of the "best-process" analysis, and X is the vendor's sigma metric. For most vendors, the sigma metric is 3 (ie, they use 3s data for process corners). Verify the X value with the vendor.
Another consideration when using characterization data is the effect of output loading. Vendors typically perform characterization with a 50-pF test load, which is true for ASIC-timing-analysis estimates as well. If the design loads are significantly more or less than the test load, an adjustment is necessary to account for loading. Like delays, loading adjustments can be modeled either as a worst-case value or as a statistical distribution based on variance in board layout and material properties. Currently, however, it is difficult to obtain statistical data for anticipated loading. To apply a worst-case loading adjustment to a statistical distribution, calculate the correction factor from the vendor's data book (same method as for worst-case analysis) and add this factor to the mean (m) device delay.
Once you've modeled the delays statistically, you can derive a joint delay distribution for a critical path. Obtain the exact distribution by combining the distributions using the convolution integral: Convolve the distribution function of the first delay in the path with the second, and then convolve the result with the next delay, etc, until you combine all of the delay components. The result is the joint delay distribution for the path. Unfortunately, the mathematics for convolution are very complex.
Fortunately, the analysis can be simpler. For module-level paths that propagate through separate devices, the delay through each device usually can be considered to be independent of the other device delays (an exception would be a design composed of several identical devices that are all from the same manufacturing batch). In addition, each delay is randomly distributed within the delay distribution for that device. In this situation, the "root-sum-of-squares" (RSS) method allows use of the Central Limit Theorem in statistics to determine the joint distribution or overall delay characterization.
The Central Limit Theorem states that the sum of a set of such independent continuous random variables asymptotically approaches a normal distribution. This implies that the total path delay will have a Gaussian distribution with the mean equal to the sum of the individual path-delay means
Equation 4
and standard deviation related to the individual delay distributions by
Equation 5
Because the individual component distributions are considered normal, the error of this approximation is smalleven for paths with only three or four components.
To reexamine the ROM path example (see box) using the RSS method, start by analyzing the memory-management unit (MMU) FPGA design. An initial analysis of all of the address- and chip-select paths indicates that the longest path is A(16). Because all of paths within the FPGA are highly correlated, you know that A(16) will always be the longest path, so you need only do detailed analysis on this one path. Rerunning the static timing analysis at 1258C, 4.5V, and all three process points yields 10.2, 11.1, and 12 nsec. Using Eq 3, s=(12-10.2)/(233)=0.3 nsec. Because the 108-pF loading on the address lines is greater than the 50 pF assumed by the timing analyzer, you must adjust m accordingly. The vendor's data book recommends adding 0.05 nsec/pF for loads greater than 50 pF, so
According to the manufacturer, the CPU has a mean setup time of 4.5 nsec and variance of 0.17 nsec; the buffer device is specified to have mean propagation delay of 5.8 nsec and variance of 0.4 nsec. The ROM is a binned part, so use the maximum delay value (90 nsec) as the mean with zero variance. Add 4 nsec to the ROM delay to account for loading, making the mean delay 94 nsec with zero variance. Using Eq 4 and Eq 5, the joint distribution for the path is a normal distribution with
However, because the distribution is considered to be a normal curve, a simpler method is to use a standard table of area under the normal curve (Ref 3).
Most standard tables of area under the normal curve contain either the area bounded by m-z and m+z; the tail area remaining below m-z and above m+z (bilateral tail area); the area left of m+z (values of the normal distribution function); or the area remaining to the right of m+z for positive values of z (unilateral tail area), where z is the sigma metric value given by
where R represents the value of the paths' required delay. By realizing that the total area under the normal curve is equal to 1, you can calculate the area remaining to the right of z using any table of area under the normal curve.
For the ROM path, z=(121.2-118.3)/0.53=5.47. Because short-term data was used to calculate m and s for the path, the initial Six Sigma goal is z>6. Although the goal isn't met, you still may be willing to accept the risks for this high-performance design. To better quantify the risk, you can approximate the long-term distribution for the path by adding 1.5s to m. Reevaluation of Eq 7 yields
A standard table of area under the normal curve lists the area remaining to the right of 3.97s as 3.606E-05 (Ref 3). The probability that a given ROM address path will fail is 1/3.606E-05 or 1 in 27,732 or 36 dpm. Again, you didn't meet the Six Sigma goal of £3 dpm. However, you've quantified the risk. Depending on the expected manufacturing volume and cost of producing a faulty unit, the project may accept the risk for this high-performance system.
Another tool that can help the designer quantify risk is the system-level yield calculation. If you perform an RSS analysis for several independent paths in the design, compute the overall probability that one of the paths will fail (PSF) using
For this type of system-level calculation, it's important to realize that paths are not always independent. There is generally a high correlation between individual delay paths through a single IC (the variances of address to data delays through a ROM will be close to identical). When parallel paths all run through the same devices, if the slowest path passes, it is likely that all of the others will pass. Therefore, the probability that a system-level path will fail is approximately the probability that the slowest path will fail.
When using statistical methods, take care to understand the assumptions of statistical analysis and to know when to apply these methods. In the example, we calculated the probability that the slowest ROM path will fail. Ignoring the faster enable paths, the probability that the system-level ROM path will fail is simply the probability that any one of the 384 permutations of the address paths will fail. If you apply Eq 8 directly, the probability is 1-(1-0.606E-05)384=0.013. You'd expect one in 72 of the boards to fail manufacturing test due to a ROM failure.
This result is incorrect. All of the paths that pass through a single ROM have a strong correlation, so Eq 8 does not apply. The probability that any of these paths will fail is approximately the probability that the slowest ROM address path will fail, or 3.606E-05. Because the system has three ROMs, the probability that the ROM function will fail is 1-(1-3.606E-05)3=1.082E-04. You should expect only one in 9244 boards to fail as a result of a ROM-address-path failure.
Use Eq 8 to combine probabilities for separate paths such as the address and chip-select paths. If the probability that a ROM chip-select path will fail is 1.23E-06, the probability the ROM function will fail is 1-{(1-1.23E-06)3×(1-3.606E-05)3}=1.12 E-04, or one in 8937 boards.
Finally, the probability that all of the paths on the board will pass, called the system-level yield, is given by
For the ROM function, the yield with respect to timing is 1-1.12E-04 or 0.999888, meaning 99.98% of the boards will operate properly. Of course this doesn't account for other potentially critical paths in the design. Further analysis will establish a good estimate of the system-level yield for the entire design.
When analyzing the entire design, use both statistical and worst-case methods. Worst-case methods provide a fast, simple approach to help identify critical timing concerns. While conservative, they tend to ensure high reliability of the final product. Statistical methods can help quantify risks allowing you to make informed decisions to balance performance, cost, and manufacturability goals. Together, worst-case and statistical methods combine as a powerful design tool.
Author's biography
James J Vorgert is an electrical design engineer with the Corporate Venture Products Group of Texas Instruments, where he currently designs both digital and analog circuit boards. For four years, he worked as a CAE-design methodology specialist with TI's Defense Systems and Electronics Group. He has a BSEE from the University of Minnesota and enjoys photography, woodworking, and backpacking in his spare time.
Acknowledgment
This article originally appeared in the Texas Instruments Technical Journal and is reprinted with permission.
A worst-case circuit analysis
Fig A shows a typical system-level path in a microprocessor design. The path's starting point is the clock input of the memory-management unit (MMU). The signal path progresses from the clock to the MMU's A(0:16) and CS(0:3) outputs. From there, the signals travel to the address and control inputs of the ROM devices. The path asynchronously propagates through one of the ROMs and drives the inputs of a standard bus-interface device. The buffer interface in turn drives the microprocessor's data inputs (CPU).
System example operating conditions Average system throughput greater than 4 MIPS System clock frequency 33 MHz Processor software ROM Processor operation (1) Opcodes prefetched during execution of previous instruction;
(2) a typical instruction requires 4 internal clocks and makes 1 operand fetch;
(3) memory-access cycles begin on the rising edge of a system
clock. The MMU is programmed to extend the cycle by up to eight clock periods, and the process latches data inputs on the rising
edge after the MMU signals that the cycle is ending.
Worst-case estimates conceal risk
Some vendors provide characterization data in statistical form: the mean (µ) and standard deviation (s) or variance (s2) at a number of important temperature and voltage combinations over the specified operating conditions. Other vendors provide a table of measurements for a sample of devices under various environmental conditions. The data may represent either long- or short-term sampling, so the designer must take care to understand what type of information is available. From this information, the designer can derive a distribution function for a device delay at a number of important temperatures and operating voltages (Fig 1).
Calculating the delay distribution
Equation 2
Statistics don't always apply
The delay distribution for each group is a part of the normal process distribution, but delays do not have normal distribution within a group (Fig 2). Because the distribution for a group of binned parts is not Gaussian, assume that the delay for that part will be equal to the vendor's specified maximum delay and that the variance of the group will be zero (ie, worst-case delay).
Consider output loading
Sample statistical analysis
Once you know the joint distribution for a path, the probability (PF) that the path delay will be greater than its requirement (ie, the path will fail) is given by the area under the distribution curve and to the right of the requirement (R) (Fig 3). You can calculate this area by evaluating
![]()