Getting clock-domain crossings right: some notes from the real world
The more aggressively designers pursue energy efficiency, the more problems they seem to create for verification (see our recent feature on the subject.) A recent seminar put on by Atrenta examined the issues that can arise from even one of the simplest measures: using multiple clock domains.
Admittedly, Atrenta has a vested interest in the seminar: they are promoting a formal verification tool that inspects clock crossings. But the sponsors kindly confined product discussion to a module at the end of the event, and let their own area expert, product director Shaker Sarwary, and a real customer, Broadcom principal design engineer Surya Hotha, discuss the issues.
Hotha’s brief presentation nicely underlined the importance of clock-domain crossings (CDCs) to a design manager. He said "We have found CDCs to be a major cause of respins. They are very hard to isolate systematically, or even to detect by manual inspection. For example, a recent design had 5 million gates and more than 30 clock domains. At that scale manual inspection becomes impractical."
Sarwary talked in more detail about the problem with CDCs. As a generalization, the problem occurs when a signal flows from one clock domain into another, and you cannot guarantee the phase relationship between the two clocks. If you don’t know the phase relationship you cannot guarantee that the arriving signal will satisfy the set-up and hold requirements of the first register it encounters in the other domain. This causes two quite different sorts of problems, according to Sarwary.
First and most obviously, it raises the potential for metastability. If you have a timing violation on a particular clock cycle, the data at the register input could be in between logic levels when the clock edge hits the register. The result could be a zero, a one, or a damped oscillation that could settle to either value, but usually, Sarwary said, to zero. Only SPICE could predict for a given net loading, a given driver, given clock edge, and exact timing, what the register will actually do. So in short, you may or may not end up with the same thing latched into the register that you applied to the input, on that one particular cycle where the two clocks were in just the wrong relationship.
This is true even though the delay between the last register in one domain and the first register in the next domain is less than the period of either clock. So neither static timing analysis nor functional simulation is going to catch the problem. And the exact relationship between clocks that permits metastability might only occur once every few billion cycles, on the average, so the problem could also escape silicon characterization, wafer sort, and final test. This creates the worst-case situation of a design error that first appears as a field reliability problem.
Sarwary described a number of strategies to attack the problem. You can use two flipflops in series, both running on the clock of the receiving domain. This prevents the metastability from being propagated beyond the initial flop, but it does not guarantee that the either flipflop has the correct value in it. To ensure that the second flipflop ends up with the correct data, you have to stretch the data pulse coming into the block so that it lasts longer then one receiving-block clock cycle.
But that doesn’t solve the problem if we are talking about a parallel set of signals rather than an isolated signal. Unless the parallel word is Gray-coded correctly (and Sarwary demonstrated one really elegant way to screw up Gray Code), the uncertainty in when the receiving register will settle to the correct value for each bit could cause wrong data to be received. Sarwary went on to describe increasingly ambitious schemes for resolving the problem, including handshaking interfaces across the clock domain boundary and use of FIFOs in conjunction with these techniques.
The second problem Sarwary covered is more insidious. Because most synthesis, power-optimization, and test-insertion tools assume that the circuit is entirely synchronous, sometimes these tools will create structures that permit transients to exist between clock pulses. Some optimizing synthesis tools will create structures that have race conditions, for example, as long as the races are resolved before the next clock edge. But if one of these structures lies between the last register in one clock domain and the first register in the next domain, it is possible for these transients to be latched on the other side of the CDC, creating an error. In addition, some optimization tools will carefully remove anti-metastability provisions that the designer has put in, since they do not appear to contribute to the function of the circuit. For this reason, Sarwary said and Hotha agreed, it is necessary to inspect the design for CDC problems not only at RTL, but at the post-tinkering netlist level as well.
All in all, the message was that CDC errors are potential chip-killers, are easy to create—even automatically courtesy of well-meaning design tools—and are marvelously hard to detect. Whether your methodology uses formal tools to inspect CDCs, or whether you have in place a review process that requires specific structures on each CDC, clearly you have to do something about the issue.
Raghavendra commented:
Jyri Poldre commented:
Vlad commented:















