EDN Access

 

March 14, 1997


Growing your own IC clock tree

Jim Lipman, Technical Editor

Defining the clock-distribution network is one of the most difficult aspects of high-speed, system-on-a-chip design. Employing the right design methodology helps you "beat the clock."

The design of a good clock-distribution network, or "tree," for a high-performance chip requires good IC-engineering skills. You define your chip's specifications for speed, power, and other factors and then integrate the clock-tree design into your normal chip-design flow. Although you need to define the kind of tree you want up front, there are some EDA tools available to synthesize the tree and verify its performance before committing the design to silicon.

The art of designing clock-distribution networks for high-speed chips is far more complex than just meeting timing specifications. Achieving clock latency and skew goals is difficult enough when you have clock signals of 100 MHz or more transversing the chip, but there are other factors besides speed. Because the clock network is one of the most power-hungry nets on a chip, you need to design with power dissipation in mind, particularly for chips used in portable equipment. The dynamic-switching current flow that causes high power dissipation can also lead to interconnect reliability problems, so you have to consider clock-tree wire widths. As if all these considerations were not enough, there is also the potential for clock-to-signal-net coupling that may result in excessive on-chip noise.

Clock-tree basics

The major task of clock-tree design is developing the interconnect geometry that connects the clock to all the cells on the chip that use a clock. These cells consist of latches, flip-flops, and other logic elements that you need to synchronize with the system clock. For clock-tree design, your major concerns are minimizing clock skew and optimizing clock buffers to meet skew specifications and minimize clock-tree power dissipation.

Variations in the clock signal's arrival time at the clock inputs of various logic elements cause clock skew. The skew is a function of two parameters: the loading of logic being clocked and the RC delay of the clock-line interconnect (Figure 1). The primary job of clock-synthesis place-and-route tools is to vary routing paths and placement of the clocked cells and clock buffers to meet maximum skew specifications.

You can choose from various clock-tree structures, including the H tree, clock grid, and balanced tree (Figure 2). You manually design the H tree, used mostly in custom layouts, and vary tree interconnect-segment widths to balance skew throughout the chip. The clock grid, also manually designed, is the simplest clock-distribution structure and has the advantage of being easy to design for low skew. However, it is area-inefficient and, even worse, power-hungry because of the large amount of clock interconnect it needs. Still, some chip vendors are using this clock structure; Digital Equipment Corp (Hudson, MA) uses a clock grid for its 300-MHz Alpha 21164 processor (Reference 1). You can implement the balanced tree, the most common clock-distribution network for high-performance chips, with the aid of clock-synthesis tools.

For a balanced tree without buffers, the clock lines' capacitance increases exponentially as you move from the leaf cell (clocked element) to the root of the tree (clock input). The extra capacitance results from the wider metal needed to carry current to the branching segments. The extra metal also results in additional chip area to accommodate the extra clock-line width. Adding buffers at the branching points of the tree significantly lowers clock-interconnect capacitance, because you can reduce clock-line width toward the root (Figure 3).

Clock-tree design considerations

When designing a clock tree, you need to consider performance specifications that are timing-related and that affect other chip-design goals. Clock-tree timing specifications include clock latency, skew, and jitter. Nontiming specifications include power dissipation (static and dynamic), signal integrity (noise due to clock-to-signal coupling), and reliability (primarily due to electromigration effects in clock lines). Many clock-design issues affect multiple performance parameters; for example, adding clock buffers to balance clock lines and decrease skew may result in additional clock-tree power dissipation.

Clock-induced delay, or clock latency, lengthens hold times and stretches clock-to-data-out times. Although designing for a maximum clock latency is important, you also need to consider a minimum latency for multiclock chips to minimize skew across clocks. (Some clocks may have much less latency than do others.) Most high-performance chips use a PLL between the clock input and a reference point on the chip (one of the clocked elements) to minimize latency. You use the PLL to shift, not "create" time, so you can reduce latency to the reference point, but you cannot affect clock skew with the PLL.

Looking at a balanced-tree clock-distribution network in more detail (Figure 4), you see buffers, which may be several layers below the main clock input, driving the clocked elements. Each split of the tree interconnect, or branch, results in an additional clock-tree-buffering layer. As a clock-tree designer, you need to specify the clocked-element cluster size and number of clock-tree levels. You also need to match the drive of a clock driver to the elements it is driving (either additional clock-tree interconnect segments and buffers or the chip's clocked elements). Typically, you need no more than three or four clock-buffer layers to design a balanced tree using a good clock-synthesis tool. Figure 4 also shows clocked-element clustering (with a cluster size of two in this figure), with a clock buffer of adequate drive strength driving each cluster.

Because the biggest problem you face in designing clock trees is skew minimization, you should be aware of how significant the problem has become for today's leading-edge chips. The factors that contribute to clock skew include loading mismatch at the clocked elements, mismatch in RC delay due to clock-line segment-width and -length variations, and process variations induced during chip fabrication.

Clock skew adds to cycle times, reducing the clock rate at which a chip can operate. Typically, skew should be 10% or less of a chip's clock cycle, meaning that for a 100-MHz clock, skew must be 1 nsec or less. High-performance µPs may require skew to be 5% of the clock cycle, or 250 psec at a 200-MHz clock rate.

Data-dependent skew is another problem. How data flows throughout, as well as on and off, the chip determines the level of on-chip switching activity. Switching activity is particularly critical for output drivers sending signals off chip, because the drivers draw a lot of current. As output buffers switch, they cause fluctuations in the current flowing through the chip's power and ground buses. These current variations affect logic-cell performance, including clock buffers whose delay and drive strength can vary with shifts in VDD and ground. Thus, clock skew can change because of on-chip data activity. Improving the robustness of your power grid can help eliminate any data-dependent skew problems. Similarly, packages with "stiffer" power and ground metallization can also help. This stiffer metallization is why PGA and similar packages minimize data-dependent skew better than do less expensive QFPs.

Clock-design methodology

Many ASIC and chip companies have comprehensive clock-network- design strategies that they use on their customers' chips. IBM Microelectronics designs many chips that require leading-edge clock-tree designs, for which the company takes a two-pronged, clock-tree-design approach. IBM engineers first use a tool called Clock Pro to develop the clock-tree skeleton (Reference 2). Clock Pro takes designer-supplied constraints, such as maximum clock skew and latency; performance parameters of available clock-driver cells; and external chip information, such as worst-case temperature and voltage. With this information, Clock Pro outputs all clock-tree combinations, sorted by latency, that can meet timing constraints. To help balance the tree and reduce skew, the tool can also include single-pin terminator cells that come in four capacitance "sizes." In addition, Clock Pro varies terminator-cell capacitance and location for balance control (Figure 5).

IBM uses Clock Pro on chips with multiple clock domains. IBM engineers see an average of 10 clock domains fed by one clock tree on a single chip. Multiple clock domains on a chip have become more prevalent with the increase in core-based chip designs, because each core may have its own clock. Besides including additional complexity to skew control over the entire chip, you may need to synchronize some clocks to the master (system) clock, but with a "late" or "early" factor added. You can synchronize those clocks by inserting fixed delays into some portions of the clock tree.

IBM's chip customers select the desired clock tree, taking into consideration clock-tree depth, latency, total buffers added, and power dissipation. The customer then synthesizes the tree using an internal IBM synthesis tool. After synthesis and during placement and routing of the clock-tree elements, the customer optimizes the tree with another IBM tool to drive the clocked elements with minimum interconnect length, skew, and latency. Clock-tree targets for IBM engineers are skew of 5% or less of clock-cycle time, a tree that drives as many as 150,000 clocked elements on a chip, and clock frequencies as high as or higher than 300 MHz. For final verification, a 3-D extraction tool based on a proprietary algorithm extracts the clock tree's RC parasitics. The customer then simulates the extracted database to verify that the tree meets timing specifications.

Motorola uses the Clock Generator tool along with Avanti or Cadence place-and-route tools. The tool combination produces a tree with minimum insertion delay, a minimum number of buffers, and maximum fan-out based on customer-input edge-rate specification. Typical skew is less than 300 psec. Most designs require a two-level clock tree; none needs more than four levels. Most of Motorola's customers' chips have four to 10 clock domains, although the company has designed chips with as many as 56 domains. Clock Generator estimates wire lengths using a process-technology file and a statistical wire-load model. Motorola plans to integrate clock-tree tools into its floorplanning tools to get better wire-length estimates.

After generation of the clock tree, the output from the place-and-route tool is flat, meaning that the design hierarchy is lost. Motorola has Merge DB, a tool that re-creates the hierarchical database, including the clock tree. A hierarchical database speeds design verification with extraction and timing tools.

Toshiba uses its Clock Tree Synthesis (CTS) tool with Cadence's Gate Ensemble or Cell3 place-and-route tools to do automatic clock-tree synthesis and placement for customers' chips. CTS clusters the clocked cells and matches interconnect-RC delay for balanced-tree routing, creating an output file annotated to the circuit file that the router uses (Figure 6). Toshiba typically achieves a clock skew of less than 100 psec when using CTS on chips with maximum clock frequencies of 100 to 125 MHz.

Clock-tree design is an important part of chip design for these and other high-performance chip vendors and ASIC suppliers. Although each company uses a different combination of third-party and internal tools and design styles, all integrate clock-tree design into the overall chip-design flow. Results for today's leading-edge chips are satisfactory. But with 500-MHz clocks and 0.18-mm process technology on the horizon, semiconductor and EDA vendors will have to make advances in software and design techniques to successfully manage clock-distribution networks for tomorrow's chips.


References

  1. Desai, Madhav, Radenko Cvijeic, and James Jensen, "Sizing of clock dis-tribution networks for high perform-ance CPU chips," Design Automation Conference Proceedings, June 1996, pg 389.
  2. Rincon, Ann Marie, Michael Trick, and Thomas Guzowski, "A proven methodology for designing one-million-gate ASICs," Custom Inte-grated Circuit Conference Proceedings, May 1996, pg 45.
  3. Lipman, Jim, "EDA tools let you track and control CMOS power dissipation," EDN, Nov 23, 1995, pg 65.
  4. Tanizawa, Tetsu and Sigeki Kawahara, "Clock driven design method (CDDM) for deep sub-micron ASICs," ASIC Conference and Exposition, September 1995, pg 241.
  5. Pullela, Satyamurthy, Noel Menezes, and Lawrence Pillage, "Low power IC clock tree design," Custom Integrated Circuit Conference Proceedings, May 1995, pg 263.
  6. Wu, Qing, Massoud Pedram, and Xunwei Wu, "Clock-gating and its application to low power design of sequential circuits," Custom Integrated Circuit Conference Proceedings, May 1997.

Acknowledgments

Thanks to Gopi Ganapathy of Silicon Valley Research for his discussion and white paper on clock-tree synthesis and to Rob Mathews of Frequency Technology for his material on interconnect inductance effects.


Clock-distribution-design tools of the trade
Today's clock-network designers have some good EDA tools to help them design high-performance clock trees. You can divide these tools into physical-tree generation tools and post-generation clock-tree-analysis tools. Clock-tree-generation tools do tasks such as balancing clock loads through clocked-element clustering and placing and adjusting clock-line interconnect length. These tools often perform buffer sizing and buffer insertion into the clock tree as well.

Clock-tree-analysis tools include tools for tree-interconnect extraction and simulation. Tools in this group need to extract large nets in a few hours or less and provide enough accuracy to evaluate clock skew to within about 10%, or approximately 25 nsec for high-performance chips.

Table A shows some EDA-vendor tools for synthesizing and evaluating clock trees. Features vary widely, even among tools developed to do the same task, so you must carefully evaluate your needs to determine which tool is best for your designs.

Table A--EDA-vendor tools for synthesis and evaluation of high-performance clock trees
Company Tool Description Comments Price1
Avanti Aquarius-XO Placement and routing Timing-driven engine $170,000
  CTS Clock-tree synthesis Option to Aquarius, supports multiple clocks $50,000
Cadence CT-GEN Clock-tree synthesis Clusters target cells, balances clusters/buffer placement, minimizes skew/insertion delay, supports gated clocks $30,000
  PBO Timing optimization Resizes buffers and logic, supports multiple clocks $65,000
Cascade Design Automation Epoch/clock-tree synthesis Clock-tree synthesis Part of physical-design tool suite, clusters target cells, supports multiple clocks, reports clock skew and delay $165,0002
Compass Design Automation Clock Tree Compiler Clock-tree synthesis ChipPlanner and PathFinder option, supports multiple clocks, gate-array or cell-based chips $30,0003
Frequency Technology VIPP4 Process calibration for extraction tools Calibrated against actual process $100,000
Silicon Valley Research Gards-Clktree Replaces a high-fan-out net with a balanced tree of buffers Used in Gards gate-array and embedded-array place-and-route tool, controls skew and insertion delay, supports multiple clocks $12,0005
  SonIC-Clktree Replaces a high-fan-out net with a balanced tree of buffers Used in SonIC structured, custom-cell-based layout tool, controls skew and insertion delay, supports multiple clocks $12,0005
  SC-CTS Replaces a high-fan-out net with a balanced tree of buffers Used in SC cell-based layout tool, controls skew and insertion delay, supports multiple clocks $12,0005
Simplex Solutions Fire & Ice Full-chip, 3-D extraction Extracts parasitics, transistors, and power grid $150,000
  Thunder & Lightning Signal-and power-compliance verification Clock-skew analysis, electromigration analysis, works with Fire & Ice $150,000
1 Starting price, unless noted otherwise
2 Price for Epoch tool suite
3 In addition to ChipPlanner and PathFinder prices
4 Available as a service
5 Clock-generation-tool price only; layout tools at additional cost
Companies offering clock-tree tools and design services
When you contact any of the following manufacturers directly, please let them know you read about their products on EDN's website. Note: All Web addresses start with http:// unless otherwise noted.
Avanti
Sunnyvale, CA
(408) 738-8881
fax (408) 738-8882
www.avanticorp.com
Cadence
San Jose, CA
(408) 943-1234
fax (408) 943-0513
www.cadence.com
Cascade Design Automation
Bellevue, WA
(206) 643-0200
fax (206) 649-7600
www.cdac.com
Compass Design Automation
San Jose, CA
(408) 433-4880
fax (408) 434-7820
www.compass-da.com
Frequency Technology
Los Altos, CA
(415) 917-5800
fax (415) 917-5817
www.frequency.com
IBM Microelectronics
Essex Junction, VT
(800) 426-2468
fax (415) 855-4121
www.chips.ibm.com
Motorola
Austin, TX
(800) 845-6686
fax (512) 891-2943
www.mot.com/sps/hpesd
Silicon Valley Research
San Jose, CA
(408) 361-0333
fax (408) 361-0330
Simplex Solutions
San Jose, CA
(408) 432-8260
fax (408) 432-8262
www.simplex.com
Toshiba America
San Jose, CA
(800) 879-4963
fax (408) 456-8910
www.toshiba.com/taec
   
Looking Ahead
Inductance effects start to appear as clock-edge times and interconnect resistance decrease, both of which occur over time with shrinking chip technology and higher clock rates. Clock trees often have wide traces at their roots and may also have long segments, making the trees more susceptible to inductance problems than are other chip nets. Careful layout, including placing power and ground lines next to, above, or below clock trees to act as shields, can help reduce the possibility of clock problems that inductance causes.  Most designers and clock-tree-parasitic extraction/evaluation tools available today deal only with RC parasitics. Designers do not now consider parasitic inductance a big problem, but this theory is starting to change. Some companies, such as IBM Microelectronics, have already come across inductance problems in trunk- clock lines wider than 30 µm, resulting in excessive voltage drops and delay. To help identify these problems, IBM has added inductance analysis to the routing tool it uses for clock trees.

Jim Lipman, Technical Editor

You can reach Jim Lipman at (510) 606-1370, fax (510) 606-1563, ednlipman@mcimail.com.


| EDN Access | Feedback | Table of Contents |


Copyright © 1995 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc.