Design quality and its impact on design closure
Steps to ensure quality early in the design can speed closure and avoid failed silicon.
Piyush Sancheti, Atrenta Inc; Sanjay Churiwala, Atrenta Inc; and Rob Knoth, Magma Design Automation -- EDN, July 15, 2010
| View as PDF |
The cost of SOC (system-on-chip) design continues
to skyrocket, market windows continue
to shrink, and design complexity continues to
grow exponentially. These challenges are only
a few of those that SOC designers face. In an
effort to prevent major disasters, designers must
ensure that the SOCs achieve design closure, which includes
meeting certain key objectives, such as performance, power,
and area. Design-closure objectives are often in conflict with
each other, however. Designers must constantly trade off one
for the other to ensure that the design stays within the enduser
application’s requirements.
A typical SOC design starts with an
RTL (register-transfer-level) description
to capture user intent and a set of
design constraints to drive implementation.
The design team first verifies the
RTL for correctness of functional intent
through simulation and formal verification.
The design then goes through
a series of implementation steps, including
synthesis and placement and
routing, which eventually result in a
GDSII (Graphic Design System 2) layout
for silicon manufacturing. The quality
of incoming design and the associated
constraints have a large impact on
the designers’ ability to reach closure.
However, you can ease this process by
using a series of design-quality measures
starting at RTL and continuing
throughout implementation, focusing
on quality measures at the five stages
during an integrated RTL-to-GDSII
implementation flow (Figure 1). You
can expand the concept to other stages
of implementation or adapt it to other
flows, including presynthesis-RTL quality;
postsynthesis, postscan-netlist quality;
post-timing-netlist quality; postplacement-
netlist quality; or postroutenetlist
quality.
Presynthesis-RTL quality
SOC designs that don’t start well
will usually fail to reach closure. Quality
measures at the RTL stage of design go a long way toward
determining successful design closure and working silicon.
Once you synthesize the design, you are to a large extent
freezing the design intent, and you have limited flexibility for
correcting design-quality issues inherent in RTL.
Modern SOC designs typically cater to multiple end
markets to amortize the high cost of design. The same design
can have multiple variations and live on for multiple
generations through updates and upgrades. This scenario is
prevalent in consumer electronics and automotive chips,
in which manufacturers accomplish 80% or more of the design through reuse. Future generations
may reuse the RTL you
create for the current design and
therefore it could have a longer
shelf life than the current design.
You must also consider commercial
third-party IP (intellectual
property), such as processors,
digital-signal-processing blocks,
and bus fabrics, and interface IP,
including Ethernet, USB (Universal
Serial Bus), and PCI (Peripheral
Component Interconnect). The SOC team typically
receives this IP in RTL.
For these reasons, you must ensure the quality of the RTL
and constraints going into synthesis. Design teams often focus
on functional correctness using simulation and formal
verification, but spending some effort on implementation
feasibility and the overall quality of RTL can go a long way
toward accelerating design closure. Design teams can achieve
such quality measures through a set of analyses on the RTL
and design constraints.
Structural and connectivity integrity
RTL linting can weed out syntax and semantic issues and
ensure compliance with coding standards. RTL designers
should, however, address more serious structural and connectivity
issues at this early stage. If left undetected, they may
later lead to more serious design-closure issues. Examples of
these issues include excessive levels of logic between flipflops
(Figure 2), combinational loops, unintentional latches, blocking assignment in sequential
blocks, variables or nonconstants
in the terminating condition of
a loop, missing asynchronous resets
from a sensitivity list, multiply-
driven nets without a tristate,
undriven nets and ports, and mismatching
between the left- and
the right-hand sides of an assignment.
Although you may be able
to detect and fix some or all of
these issues during synthesis or
later stages of implementation, it is more efficient to fix them
before putting any effort into implementation.
Clocks and resets
A typical SOC contains heterogeneous IP from different
sources. As a result, the number of asynchronous clock domains
on a chip has increased dramatically. It is possible for one chip to
have 20 or more clock domains. You must ensure that clocks and
resets are properly designed. When data signals cross between
asynchronous clock domains, you must synchronize them to prevent
metastability (Figure 3). Clock synchronizers can range from
multiple-flip-flop synchronizers to more exotic schemes, such as
FIFO (first-in/first-out) buffers with handshaking. It is important
to prevent the data loss and reconvergence of synchronized
signals to ensure reliable behavior. You must synchronize deasserted
resets, even if they are asynchronous, with the clock
domain.
You should ensure not only that synchronizers are in place
on crossings but also that you’ve correctly implemented the protocol. For example, a FIFO should have no overflow or underflow,
and you must implement proper sequencing between
requests and acknowledgments in a handshaking scheme.
Functional simulation may not detect clock-domain-crossing
bugs unless verification engineers create dedicated testbench
scenarios for each crossing, a daunting task for designs that
have thousands of such crossings. You must employ structural-
analysis and formal-verification techniques to exhaustively
analyze and verify clock-domain crossings.
Power reduction
Power has come to the forefront of design-closure concerns
for a variety of reasons, including battery life, cooling
costs, reliability, and energy efficiency. Studies show that the
determination of more than 80% of the power of a design
happens by the time it enters synthesis. For that reason, you
must address power management early in the design flow,
using architectural techniques, such as multiple voltage domains,
power domains, and dynamic-voltage-frequency scaling,
and RTL techniques, such as clock and data gating. Designers
must start with an estimate of the power consumption
of the design and selectively apply these techniques based on
power targets for the design.
Voltage and power domains add new challenges for design
closure. In voltage domains, it is important to insert level
shifters wherever signals cross from one voltage domain to
another. Similarly, you must place isolation cells in power
domains that can be shut off when not in use to ensure that
unpowered outputs are not floating. These floating signals
could introduce a functional error or a high-leakage path to
ground. You must also ensure that the enable logic for isolation
cells is in an always-on domain. Some designers insert
level shifters and isolation logic at RTL, and others capture
the power intent in CPF (Common Power Format) or UPF
(Unified Power Format) for automatic insertion by downstream
implementation tools. In either case, designers must
ensure that the design has level shifters and isolation-logic
cells at each such crossing.
With judicious use, clock gating can be an effective powerreduction
technique. Most synthesis tools can automatically
insert gating based on clock enables in the RTL. However,
not all clock gates save power, especially in the case of registers,
such as flip-flops, that are almost always enabled or if
the design has only a few gated registers. In such cases, the additional gating logic can consume more power than the
power you save by gating the clock. Excessive clock gating
can lead to timing-closure issues and contribute to routing
congestion. You should instead selectively apply clock gating
in places in which it has the most impact on power.
RTL analysis for clock gating can help in a number of ways.
At RTL, you can identify global clock-gating signals, which
can gate clocks for an entire design or for large register banks.
A review at RTL can also analyze and prioritize explicit clock
enables. RTL designers define these enables for their powersaving
potential and help remove those that save little or no
power. Power-management designers can also identify new or
implicit clock-gating opportunities that RTL designers may
have overlooked. In addition, power-management specialists
can also generate directives for synthesis to intelligently gate
clocks.
Various clock-gating opportunities are available to RTL
designers (Figure 4). Power designers can do similar analysis
to identify data-gating opportunities, in which a cloud of
combinational logic drives an enabled register. Gating the
combinational logic using the same enable that you apply to
the terminal register eliminates wasted power resulting from toggles in the combinational logic when the register is disabled.
For example, an N-bit multiplier, with the input data
bits arriving at different times, is a candidate for data gating.
The multiplier continues to multiply even though the
results remain unused until all the bits of both data inputs
have arrived. Data gating can be an effective technique for
such datapath-intensive designs that digital-signal processing
commonly uses.
Design for test
Designs must have high test coverage, both for stuck-atand
at-speed-fault modes, especially in consumer electronics,
which must quickly reach silicon volume with few defects.
Traditionally, design teams stitch scan chains during synthesis
or later stages and test coverage and then estimate test coverage
using ATPG (automatic-test-pattern-generation) tools.
However, most testability issues are detectable and correctable
at RTL, so the design will eventually meet test-coverage
goals.
For example, the key to high stuck-at-fault coverage is to
make sure that the design is fully controllable and observable
in scan mode. However, high stuck-at-fault coverage in RTL
encounters many barriers, including nonscannable flip-flops
whose inputs are unobservable and whose outputs are uncontrollable.
Designs that internally generate control signals,
such as clock or asynchronous set/clear, are the most common
causes of this situation. Nontransparent latches are other major issues in that their inputs are unobservable and their
outputs are uncontrollable. Large memories and analog- and
mixed-signal blocks similarly suffer from inputs that are unobservable
and outputs that are uncontrollable. The enable
pins of tristates are unobservable. Combinational feedback
loops also restrict testability, and test-mode values in capture
mode may restrict controllability.
Despite efforts from RTL designers, some parts of the design
may still not be observable and controllable and may require
the insertion of additional test points. Test-coverage analysis
at RTL can help detect where to place additional test points
and their resulting impact on test coverage. For example, in
one design, adding 12 test points increased the test coverage
from less than 94% to more than 98% (Figure 5). It is easier
to add test points in RTL when you fully comprehend the design’s
intent than in the later stages of implementation.
In deep-submicron designs—those at the 90-nm and
smaller nodes—designers worry about transition faults that
can occur at normal clock speed. Stuck-at-fault testing,
which typically uses a slow test clock, does not detect transition
faults. Designers must perform at-speed testing in which
system clocks multiplex the test clocks. This step adds a level
of complexity for timing closure. At-speed testing also introduces
functional closure challenges, such as those that occur
when asynchronous clock domains share the same test clock,
which could affect the at-speed test coverage. It is therefore
critical to estimate at-speed test coverage at RTL and fix potential
functional and timing-closure issues.
DFT (design for test) poses a unique challenge for IP reuse.
IP that meets test-coverage goals in a previous design could
fall short in the current design. For example, if some inputs of
the IP are tied to constants in the current design, parts of the
IP may become uncontrollable. This issue could affect the
test coverage of the SOC. Hence, you must perform test-coverage
analysis at both the block/IP level and the SOC level.
Design constraints
Design constraints are a critical part of the design intent.
They capture the designer’s requirements on the performance,
power, and area from implementation. The quality of
constraints is just as critical as the quality of RTL in synthesis.
At this early stage, designers usually manually define constraints
for clock frequencies, input and output delays, modes
of operation, and exceptions—false and multicycle paths. Because this step is the starting point for implementation,
the completeness and correctness of constraints are critical
in meeting design closure.
You might catch some constraint issues during synthesis
if you carefully examine the synthesis transcript. You might
find, for example, missing-clock or -mode constraints when
the design is using a multiplexed clock (Figure 6). Other
constraints may occur when input and output delays reference
an incorrect clock, multiple mode constraints tie the
same node to conflicting constant values, or timing exceptions
are missed on asynchronous clock-domain crossings.
However, synthesis or static-timing analysis may not catch
more serious issues in design constraints. These issues typically
involve constraints that are simply assertions to the
synthesis and static-timing analysis; they thus remain undetected.
You may not catch such issues until final chip integration
or, worse yet, in silicon.
For example, a generated clock does not derive from the
declared source clock, and the
waveform for a generated clock
differs from the implied waveform,
depending on the presence
and location of inverters
in clock-divider circuits.
Other examples include missing
clock latency or uncertainty,
missing delay constraints on primary inputs or outputs,
block-level constraints that are more relaxed than the
chip-level requirements, incorrect or insufficient timing budget
along a snaking path, or an incorrect multiplier for multicycle
paths.
Analysis of design constraints in RTL design can help
avoid these issues. Designers can exploit this analysis to generate
the clock, input, and output constraints in a correct-byconstruction
way, thereby eliminating many overlooked bugs.
For instance, you can use clock-domain-crossing analysis and
knowledge of asynchronous control signals to generate timing
exceptions. At RTL, you can address inconsistencies between
block- and chip-level constraints by comparing the sets
of constraints in the context of the complete RTL design.
Postscan-netlist quality
At the postsynthesis stage, the design has gone through
logic synthesis, resource sharing, Boolean optimization, and
scan-chain insertion. Assuming that RTL and constraints in synthesis are high-quality, the resulting netlist should be in
good shape. However, a lot has changed in the design, and
constraints may also have changed. You can take some measures
to ensure quality at the postsynthesis, postscan netlist
and constraints.
At this stage, the design may contain power and voltage
domains to manage power consumption. You now must perform
an exhaustive verification of these domains to ensure
proper insertion of level shifters and isolation logic. Even
if you inserted these level shifters and isolation logic before
synthesis or scan-chain insertion, the design may still need
updates. In one such example, an isolation cell is missing in
the scan path because DFT designers inserted the scan logic
after the implementation of power domains (Figure 7). Adding
the isolation cell in the scan path fixes a potential functional
bug or a high-leakage path to ground.
Such power-domain bugs also commonly occur when designers
forget the required “don’t-touch” constraints in the
synthesis script. This omission can cause buffer optimization
during synthesis, which removes level shifters or isolationlogic
cells.
SOC designs are now so large and complex that implementation
may need to be hierarchical. Designers typically
perform synthesis at the block level, and chip integration occurs
at the netlist level. In such scenarios, you must merge
block-level-synthesis constraints into chip-level constraints.
Manually merging constraints can be error-prone and could
lead to an incorrect set of chip-level constraints for later timing
closure. You could at this stage apply constraint consistency
and correctness checks. You can automatically merge
constraints by using block-level constraints and the full-chip
netlist to prevent bugs.
If constraints have changed at this stage, it is important to
establish equivalence with the original constraints for synthesis.
Just as you can perform logic-equivalence checking
for design stages from RTL to netlist, you can now establish
equivalence between RTL and netlist constraints. Designers
can ensure the integrity of design constraints and eventual
design closure by adopting constraints equivalence in their
flow.
If the implementation flow is hierarchical, you might want
to stitch together chip-level test logic at the netlist level. In
such scenarios, ensure that the global test clocks and test-mode signals propagate to individual blocks. You should perform
quality checks to ensure that required values propagate
to pins on subblocks when they specify a driving condition at
a primary input or an internal node of the chip. Similarly, designers
can benefit from connectivity checks to ensure that
a path exists between two user-specified nodes in the design
with an optional sensitization condition (Figure 8). For example,
imagine that Pin A in Block 1 does not connect to a
primary output and that Pin B in Block 2 is observable. By
establishing a path between Pin A and Pin B, you can now
ensure that Pin A is observable.
Post-timing-netlist quality
At the post-timing stage, you must ensure that the design
meets timing requirements and starts analyzing timing violations
from static-timing analysis. This stage is another critical
one. Timing closure can be a challenge if the design is
overconstrained or incorrectly constrained. Other causes of problems could be structural defects, such as combinational
loops, excessive levels of logic, or unregistered outputs from
blocks or IP, all of which you should have detected at earlier
stages of design.
Timing exceptions fall into two broad categories: false
paths and multicycle paths. False paths between two registers
are those you cannot sensitize in the design or are otherwise
irrelevant for timing closure. Multicycle paths, on the other
hand, are possible but take multiple clock cycles to complete.
Unless you identify the false and multicycle paths in the design
constraints, static-timing-analysis tools assume that all
paths are possible and single-cycle.
An incorrect timing exception can lead to a critical timing
failure in silicon. On the other hand, every timing exception
that remains unidentified is unnecessary and represents
a wasteful timing-closure burden. It is therefore a fine balancing
act to find just the right timing exceptions. You must
at least formally verify all timing exceptions in use to ensure
their validity.
Another step that could accelerate timing closure is to
look for additional timing exceptions, especially in those
paths that are violating timing. You should formally verify
every such path as a possible false- or multicycle-path candidate;
if it is false or multicycle, you should add it to the list
of timing exceptions for static-timing analysis. Consider the
results of timing analysis on two timing-critical blocks from a
multimedia design (Table 1). The timing results dramatically
improve when you identify additional timing exceptions
from paths that initially failed to meet timing. The impact
on gate count and, hence, area is minimal.
Postplacement-netlist analysis
By the postplacement-analysis stage, the design has entered
physical implementation and has undergone physical
synthesis, placement, and clock-tree synthesis. You should
repeat quality measures on the now fully placed netlist. You
now have a more accurate assessment of the power, area, timing,
and test coverage, and you can compare this estimate
with the estimated results from RTL to identify blocks that
may have deviated.
You can at this stage perform additional netlist-quality
checks, such as floating pins or nets; clock, select, enable,
or reset pins that tie to constants; unused or disabled cells;
undriven or multiply-driven nets in the netlist; overloaded
cells; underloaded cells, wasting area and power; pins connected
to specific nets, such as tristate, clocks, and resets;
scan-chain nets with more than the maximum number of
elements; and high-leakage or snake paths. You should also
check that pins connecting to the same net are of the same
connectivity class.
Before clock-tree synthesis and before a clock network exists,
you should specify, in the design constraints, the values
the tools should assume for clock latency and clock slew rate.
However, assuming that you’ve inserted the clock tree by this
stage of the design, it is time to compute and apply the delays
and slew. At this time, you must also update and verify design
constraints for two critical areas. First, replace assumed transition times on individual flip-flops
with transition times only on the primary
inputs to the design. Second, set
clock delays to propagated, rather than
to a user-defined network latency on
clocks.
Postroute-netlist analysis
Postroute-netlist analysis represents
the home stretch for design implementation,
yet design teams spend a lot of
time and effort at this stage to close
timing, signal integrity, manufacturability,
power integrity, and a host of
physical effects. Assuming that you’ve
followed the quality measures in earlier stages, the design and
constraints should be fairly high quality, and you should focus
on these physical effects. In addition, you should place
significant effort in layout and physical verification, tackling
process variability and other manufacturing issues. This stage
also encompasses final sign-off on power, timing, testability,
and die size; hence, it is best to repeat the quality measures
from earlier steps as part of the final sign-off.
In short, the quality of a design and its associated constraints
have a large impact on design closure. You can, however,
take a series of quality measures to improve the chances
of design closure. It is also important that you take most
of these measures at the early stages of design, especially at
RTL, at which point you best comprehend the user’s intent.
The later in the implementation flow that you address design
quality, the less impact it is likely to have on design
closure. If you get design goals and quality objectives right
from the start, it is only a matter of staying the course during
implementation.
























