Hierarchical timing concepts

-October 25, 2013

With the shrinking of technology to deeper sub-micron levels, SoC design is getting more complex every day as more functionality gets incorporated into the chips. As SoC designers navigate this unchartered territory and EDA tool vendors strive to match the pace of the VLSI technology drive, hierarchical design is becoming the norm for timing closure.

Hierarchical timing helps to close the design with long run-time, large memory foot-print, or a block whose design is yet to mature. There can be different phases of timing analysis of a full-chip design. Initially timing models of blocks with necessary time-budgeting can be used in chip-top STA instead of the block gate netlists until timing clean blocks are available.  There are steps involved like creating the timing models (Quick/Extracted/Interface) of the blocks and using them in different phases of the timing analysis.  In this article, the concept of hierarchical timing closure and various types of hierarchical timing paths will be discussed.

Hierarchical timing closure

Hierarchical timing closure involves:

  • dividing a complete SoC into a number of smaller physical hierarchies (termed “partitions”)
  • budgeting I/O timing of the partitions and creating their timing models
  • carrying out physical implementation of partitions and the rest of the SoC (termed as “block or partition” and “top or chip-top") separately in parallel
  • doing timing iterations time to time with partition timing models at SoC level to refine and have better accuracy and correlation of partition timing at the SoC level

Finally, timing on partition and the rest of the SoC are closed independent of each other.  Before sign-off, stable partitions are stitched into the SoC and full-chip timing closure is ensured. Various timing models such as Quick Timing Model (QTM), Extracted Timing Model (ETM) and Interface Timing Model (ITM) are used to accurately represent the timing information of the partition at various stages for proper hierarchical timing closure purposes.  Timing of the interface paths are the biggest design challenge in the timing closure of a hierarchical design and thus proper understanding of the same is essential for the STA engineer handling the same. This paper attempts to throw light on such interactions between partition and chip-top in various scenarios.

Figure 1 Hierarchical partioning of an SoC.

Essentially, there are only four types of interactions possible between partition and chip. These are:


It should also be noted that what is true of partition and chip interactions is also true for flop and partition interaction as they share the same relationship. A flop is to a partition what a partition is to a chip, and the analogy is evident in the timing equations as well.

Effective setup and hold time

Before getting started, it is important to understand the widespread terminologies of effective setup and effective hold time. Let’s understand the concept in the simple case of a timing path with positive skew cks and no data delay. Let su and h be the setup and hold times of the capture flop respectively.

Figure 2 Timing path with positive skew.

Figure 3 Waveforms for Figure 2; launch, capture waveform, and skewed capture waveform.

Figures 2 and 3 illustrate the meaning of the term effective setup and hold time, su' and h' respectively, in the simple case of a timing path with positive skew and zero data path delay.

su'=cks−su, the negative sign implying that the setup time is calculated after the actual capture edge; and h'=h+cks.

So for the case of positive clock skew, the effective hold time increases in magnitude by the clock skew and effective setup time decreases in magnitude by clock skew, which is well in line with our understanding of timing criticality behavior with respect to positive skew.


Figure 4 CLK IN DATA IN at flop level inside partition.

Figure 5 CLK IN DATA IN at partition level.

CLOCK IN DATA IN is one of those cases which does not turn out to be very critical timing-wise, as both clock and data face large pad delay which cancel each other out in the calculation of effective setup and hold time.

From Figure 4,

T+c1+c2=ck2q+d1+d2+tsu(flop)          (1)
ck2q+d1+d2=th (flop)+c1+c2             (2)

From Figure 5,

T+c1=ck2q+d1+tsu(partition)          (3)
ck2q+d1=th (partition)+c1             (4)
From equations 1 and 3 REQUIRED setup time is,

tsu(partition)=tsu(flop)−(c2d2)          (5)
tsu(partition)=tsu(flop)(Clock-data skew inside partition)

From equations 2 and 4 REQUIRED hold time is,

th (partition)=th (flop)+(c2d2)          (6)
th (partition)=th (flop)+(Clock-data skew inside partition)

These effective setup and hold times at partition level when derived from its constituting flops are often termed as REQUIRED setup and hold times.

As stated earlier, equations 5 and 6 can be re-written for chip and partition level in the same way as they share the same relation as partition and flop. Rearranging the same will result in the following:

tsu(partition)=tsu(chip)+(Clock-data skew at top level)          (7)
th(partition)=th(chip)(Clock-data skew at top level)             (8)

Equations 7 and 8 are the effective setup and hold times at partition level which, when derived from the chip top it constitutes, are often termed as AVAILABLE setup and hold times. Thus available setup and hold times can be easily derived from the required setup and hold times and vice-versa.


CLOCK IN DATA OUT is one of the most timing critical scenarios faced by the STA engineer as both data pad and clock pad delays need to be accommodated for since a round trip delay comes into picture now.

Figure 6  CLK IN DATA OUT at flop level inside partition

Figure 7 CLK IN DATA OUT at partition level

From Figure 6,

T+c2-c1=d1+d2+su+ck2qmax(flop)          (9)
c2c1+h=d1+d2+ck2qmin(flop)                 (10)

From Figure 7,

            T+c2=d2+su+ck2qmax(partition)          (11)
                   c2+h=d2+ck2qmin(partition)          (12)

From equations 9 and 11,

ck2 max⁡(partition)=ck2 max⁡(flop)+c1+d1          (13)
ck2 max⁡(partition)=ck2 max⁡(flop)+(Clock data round trip delay inside partition)  (14)

From equations 10 and 12,

                              ck2 min⁡(partition)=ck2min⁡(flop)+c1+d1          (15)
ck2min=ck2 min⁡(flop)+(Clock data round trip delay inside partition)   (16)

Again, these are the REQUIRED ck2q max and min delays.  The AVAILABLE ck2q max and min delays can be obtained by replacing chip by partition with chip, flop with partition and re-arranging,

ck2 max⁡(partition)=ck2 max⁡(chip)(Clock data round trip delay at top level)      (17)  ck2min⁡(partition)=ck2min⁡(chip)(Clock data round trip delay at top level)        (18)

Loading comments...

Write a Comment

To comment please Log In