| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
March 14, 1997 Growing your own IC clock tree Jim Lipman, Technical Editor Defining the clock-distribution network is one of the most difficult aspects of high-speed, system-on-a-chip design. Employing the right design methodology helps you "beat the clock." The design of a good clock-distribution network, or "tree," for a high-performance chip requires good IC-engineering skills. You define your chip's specifications for speed, power, and other factors and then integrate the clock-tree design into your normal chip-design flow. Although you need to define the kind of tree you want up front, there are some EDA tools available to synthesize the tree and verify its performance before committing the design to silicon. The art of designing clock-distribution networks for high-speed chips is far more complex than just meeting timing specifications. Achieving clock latency and skew goals is difficult enough when you have clock signals of 100 MHz or more transversing the chip, but there are other factors besides speed. Because the clock network is one of the most power-hungry nets on a chip, you need to design with power dissipation in mind, particularly for chips used in portable equipment. The dynamic-switching current flow that causes high power dissipation can also lead to interconnect reliability problems, so you have to consider clock-tree wire widths. As if all these considerations were not enough, there is also the potential for clock-to-signal-net coupling that may result in excessive on-chip noise. Clock-tree basics The major task of clock-tree design is developing the interconnect geometry that connects the clock to all the cells on the chip that use a clock. These cells consist of latches, flip-flops, and other logic elements that you need to synchronize with the system clock. For clock-tree design, your major concerns are minimizing clock skew and optimizing clock buffers to meet skew specifications and minimize clock-tree power dissipation.
Clock-tree design considerations When designing a clock tree, you need to consider performance specifications that are timing-related and that affect other chip-design goals. Clock-tree timing specifications include clock latency, skew, and jitter. Nontiming specifications include power dissipation (static and dynamic), signal integrity (noise due to clock-to-signal coupling), and reliability (primarily due to electromigration effects in clock lines). Many clock-design issues affect multiple performance parameters; for example, adding clock buffers to balance clock lines and decrease skew may result in additional clock-tree power dissipation. Clock-induced delay, or clock latency, lengthens hold times and stretches clock-to-data-out times. Although designing for a maximum clock latency is important, you also need to consider a minimum latency for multiclock chips to minimize skew across clocks. (Some clocks may have much less latency than do others.) Most high-performance chips use a PLL between the clock input and a reference point on the chip (one of the clocked elements) to minimize latency. You use the PLL to shift, not "create" time, so you can reduce latency to the reference point, but you cannot affect clock skew with the PLL.
Because the biggest problem you face in designing clock trees is skew minimization, you should be aware of how significant the problem has become for today's leading-edge chips. The factors that contribute to clock skew include loading mismatch at the clocked elements, mismatch in RC delay due to clock-line segment-width and -length variations, and process variations induced during chip fabrication. Clock skew adds to cycle times, reducing the clock rate at which a chip can operate. Typically, skew should be 10% or less of a chip's clock cycle, meaning that for a 100-MHz clock, skew must be 1 nsec or less. High-performance µPs may require skew to be 5% of the clock cycle, or 250 psec at a 200-MHz clock rate. Data-dependent skew is another problem. How data flows throughout, as well as on and off, the chip determines the level of on-chip switching activity. Switching activity is particularly critical for output drivers sending signals off chip, because the drivers draw a lot of current. As output buffers switch, they cause fluctuations in the current flowing through the chip's power and ground buses. These current variations affect logic-cell performance, including clock buffers whose delay and drive strength can vary with shifts in VDD and ground. Thus, clock skew can change because of on-chip data activity. Improving the robustness of your power grid can help eliminate any data-dependent skew problems. Similarly, packages with "stiffer" power and ground metallization can also help. This stiffer metallization is why PGA and similar packages minimize data-dependent skew better than do less expensive QFPs. Clock-design methodology
IBM uses Clock Pro on chips with multiple clock domains. IBM engineers see an average of 10 clock domains fed by one clock tree on a single chip. Multiple clock domains on a chip have become more prevalent with the increase in core-based chip designs, because each core may have its own clock. Besides including additional complexity to skew control over the entire chip, you may need to synchronize some clocks to the master (system) clock, but with a "late" or "early" factor added. You can synchronize those clocks by inserting fixed delays into some portions of the clock tree. IBM's chip customers select the desired clock tree, taking into consideration clock-tree depth, latency, total buffers added, and power dissipation. The customer then synthesizes the tree using an internal IBM synthesis tool. After synthesis and during placement and routing of the clock-tree elements, the customer optimizes the tree with another IBM tool to drive the clocked elements with minimum interconnect length, skew, and latency. Clock-tree targets for IBM engineers are skew of 5% or less of clock-cycle time, a tree that drives as many as 150,000 clocked elements on a chip, and clock frequencies as high as or higher than 300 MHz. For final verification, a 3-D extraction tool based on a proprietary algorithm extracts the clock tree's RC parasitics. The customer then simulates the extracted database to verify that the tree meets timing specifications. Motorola uses the Clock Generator tool along with Avanti or Cadence place-and-route tools. The tool combination produces a tree with minimum insertion delay, a minimum number of buffers, and maximum fan-out based on customer-input edge-rate specification. Typical skew is less than 300 psec. Most designs require a two-level clock tree; none needs more than four levels. Most of Motorola's customers' chips have four to 10 clock domains, although the company has designed chips with as many as 56 domains. Clock Generator estimates wire lengths using a process-technology file and a statistical wire-load model. Motorola plans to integrate clock-tree tools into its floorplanning tools to get better wire-length estimates. After generation of the clock tree, the output from the place-and-route tool is flat, meaning that the design hierarchy is lost. Motorola has Merge DB, a tool that re-creates the hierarchical database, including the clock tree. A hierarchical database speeds design verification with extraction and timing tools.
Clock-tree design is an important part of chip design for these and other high-performance chip vendors and ASIC suppliers. Although each company uses a different combination of third-party and internal tools and design styles, all integrate clock-tree design into the overall chip-design flow. Results for today's leading-edge chips are satisfactory. But with 500-MHz clocks and 0.18-mm process technology on the horizon, semiconductor and EDA vendors will have to make advances in software and design techniques to successfully manage clock-distribution networks for tomorrow's chips. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Acknowledgments Thanks to Gopi Ganapathy of Silicon Valley Research for his discussion and white paper on clock-tree synthesis and to Rob Mathews of Frequency Technology for his material on interconnect inductance effects.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| EDN Access | Feedback | Table of Contents | | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Copyright © 1995 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||