NEC architects–around mixed-signal roadblocks
Two papers presented by NEC Electronics at the International Solid State Circuits Conference this week illustrate the extent to which mixed-signal designers are turning to architectural innovation to go where scaling can no longer take them. An all-digital phase-locked loop (PLL) breaks out of the envelope to reach a new power-vs.-noise operating point. And a 16 Gbit transceiver design in 90nm CMOS demonstrates that architectural changes can overcome fundamental timing limits in decision-feedback equalizers (DFEs.)
The all-digital PLL attacks the twin problems of power consumption and noise in mobile RF devices. NEC senior principal researcher Tadashi Maeda explained that in low-power wireless interfaces nearly three-quarters of the power goes into the RF section, and nearly half of that goes into the PLL. The frustrating part, Maeda said, is that most of these air interfaces are designed for intermittent operation at low duty cycles to save energy—but the long lock times of analog PLLs make it impossible to shut down the PLL during the idle periods.
One solution is a digital PLL that performs phase detection in the digital domain and uses a digital pulse-frequency modulation stream to control the oscillator. This design can be made to lock quickly, so you can shut it down when the transceiver is idle, but it burns so much power relative to an analog PLL that there is no net gain. Worse, quantization noise in the digital control signal carries the PLL outside the noise budget for common air interfaces such as Bluetooth or WiMAX.
Maeda said that NEC researchers developed two circuits to address these problems. The first is a two-stage phase detector. The first stage does a coarse delay measurement, and then clock-gates itself off to save energy. The second stage, running at much lower power, works like a vernire scale to make high-resolution delay measurements based on the output of the coarse stage. The overall circuit significantly reduces average power.
Now for the noise problem. The NEC team developed a dithering circuit that modulates the pulse train from the phase detector, suppressing frequency variations that cause phase noise in the oscillator. The overall result, Maeda said, is a PLL with half the area of an analog design, a third of the average power, and a more than adequate -105dBc/Hz noise level, 15dBc/Hz better than a comparable analog design.
NEC’s second paper attacked another challenge—a 16Gbit transceiver. Once again, the design team involved chose architectural innovation rather than circuit optimization to reach their goals.
In this case, the work focused on one component of the transceiver design, the DFE. The underlying issue the DFE addresses is inter-symbol interference. If the pulse response of the interconnect in a high-speed serial link is longer than the time between symbols—which it usually is—a pulse from the transmitter smears across subsequent symbol intervals by the time it gets to the receiver. That makes samples taken in the subsequent intervals ambiguous—am I seeing a one, or the sum of the tails of a bunch of preceding ones?
The most common solution today is a DFE. In principle, once you decide that you have received a pulse, you can subtract weighted values from the next few samples before you capture them, in effect backing out the tail of the current pulse from future measurements. It’s a digital-minded approximation to applying the inverse of the channel transfer function to the received waveform, but it works—up to a point.
That point, explained NEC Device Platforms Research Labs assistant manager Kouichi Yamaguchi, occurs at about 10Gb, when the sample interval is only 100ps. The problem is a critical timing path within the DFE. At 10 Gb or above you really need to subtract out quantities from more than just the next sample—you need multiple taps. So the DFE looks something like this. The data comes in, passes through a series of analog coefficient-multiplier/summing junctions—one for each tap—and then enters a clocked comparator called a slicer. The slicer decides whether the sample is a one or a zero, and passes its digital decision on. A feedback path—the first tap—carries the digital value back into the inverting and multiplying input of one of the summing junctions, so that a fraction of the digital decision signal gets subtracted from the input signal. If the DFE has multiple taps, the decision output goes through a series of latches, one for each tap, each latch driving another feedback loop that goes back to another summing junction on the input side. So the decaying tail of the pulse gets subtracted out of each succeeding sample.
The critical timing path, Yamaguchi said, is the first tap. The set-up time and propagation delay of the slicer, plus the flight time of the feedback loop and the delay of the summer must be less than one sample interval. But the low signal amplitude means the slicer must have high gain, and that means it is relatively slow. You can use speculation to eliminate this critical path, but that ends up for a variety of reasons making the second tap a critical path. The result, Yamaguchi said, is that transceiver papers at the ISSCC and VLSI Symposium in recent years have clustered around 10Gb, with little upward trend.
The solution NEC adopted was to design a four-tap DFE in which the first stage is feedforward rather than feedback. In other words, the first stage simply subtracts a weighted, delayed copy of the original signal from the incoming waveform without bothering to make a decision. Then subsequent stages have feedback paths around the slicer as usual. The result is to eliminate the critical timing path altogether. The approach seems to be catching on, since at least two other papers in the same session this week described clock-data-recovery circuits using feed-forward techniques.
NEC also designed their DFE as a time-multiplexed, multi-channel device. The design uses four equalizers running in parallel on different phases of a quarter-rate sampling clock, further reducing timing constraints and power consumption. In addition there is a separate edge-equalizer unit with an architecture essentially the same as the other DFEs.
Yamaguchi said that in 90nm CMOS the design handles 16Gb/s operation with an eye opening of 0.32UI at 10-12 bit error rate, more than twice the speed needed for PCI Express gen3. Power consumption is 69mW, comparable with 10Gb equalizers, and should drop to around 10mW by 32nm. It is possible, Yamaguchi said, that scaling to advanced geometries would also improve the speed of the design, although the fact that critical paths of the DFE are analog rather than digital makes scaling inherently hard to predict.
What we can predict safely is that as performance pressure increases on mixed-signal circuits, we will see a lot more rethinking of architectures and a lot less reliance on scaling and circuit optimizations to meet design goals.















