Part 1: OC-48, OC-192, and beyond
Analysts say that the high-speed communications market is poised to produce revenues in the billions. Upper management says it knows where the market's sweet spot is. Marketing says it needs the product yesterday. Your job is to make it all come together.
By Nicholas Cravotta, Technical Editor -- EDN, November 9, 2000
|
|
|
Networks are all about more bandwidth, and 2.5-Gbps OC-48 and 10-Gbps OC-192 promise to deliver it. However, many of the important standards are still moving. Few off-the-shelf parts are available. Longer distances mean tighter jitter specs, which mean more expensive components and more difficult design. Compound the problem with the fact that network boxes are getting neither bigger nor a larger piece of the power budget, but they have to process more data. And only a handful of people really understands how to do it all. It's called the "bleeding edge" for a reason.
Partitioning functionality for processing high-speed connections is not yet clearly defined. It's possible to show the list of required functions, but how and when these functions manifest themselves is still up for grabs (Figure 1). For example, instead of buying a stand-alone transceiver, you can buy an optical interface or a framer with a built-in transceiver. However, you really need only one transceiver, and the option you choose determines the placement of the CDR (clock-and-data-recovery) unit and the CMU (clock-multiplication unit).
So many choices exist partly because of the homesteading mentality of the bleeding edge. Once a company delivers a part, its marketing department begins to "peek over the fence" at other vendors, starts coveting the functions that they supply, and tries to convince its own engineers to integrate those functions in a next-generation part. Thus begins the rush to claim more and more of the system pie. Eventually, two parts, such as an optical module and a framer, begin to fight over the same function territory—in this case, the transceiver.
The trade-offs are not always obvious or easy to make. Does it make sense to put receiving and transmitting on the same chip? Putting the CDR unit and CMU on the same die means integrating more than one VCO, which can be difficult considering that receiving and transmitting can be off from each other by a few parts per million, resulting in VCO crosstalk. For optical modules, however, one chip is better than two, given space restrictions. Additionally, if the SERDES (serializer/deserializer) function is in a separate IC, it requires many interface pins. A single chip saves a tremendous amount of space on traces alone.
Cleanly partitioning functions is key to avoiding overlapping of functions. However, what is clean for one application may be a major obstacle for others. For example, marrying the transceiver with the framer means less expensive ICs. However, you then have to run a 2.5- or 10-Gbps signal on the board. The transceiver with the optical module gives you a module that you can drop into a system with a lower speed, parallel interface to the framer. The module also more tightly control jitter. Which is less costly? That depends on how you want to deal with the 2.5- or 10-Gbps signal. Again, it's a matter of trade-offs, and no single best answer now exists.
Selling "solutions"
Several of the larger players, such as AMCC, PMC-Sierra, and Vitesse, have employed the acquisition method of R&D: Let someone else develop a winning product and then buy the company and integrate the product into your line. Start-ups, however, can't afford to use this approach. Many start-ups are still years away from significant revenue, so they must either form solid partnerships with other companies or get another company to buy them. Note that vendors no longer sell ICs; they sell "solutions." The problem with the term "solution" is that it presupposes that the vendor knows exactly the problem that you are trying to solve.
The reality is that no single company has a complete solution. Security, for example, is still not a part of most system block diagrams. The general-purpose-network-processor market has reached critical mass and begun to fragment into myriad coprocessors that focus on a particular function. Some companies are even beginning to offer more than one network-processor family for different market segments. But the problem is still too complex for one vendor to completely solve it.
Relationships within the OC-192 market will play an important role in the success of various products because one company alone cannot solve the entire problem. Additionally, it's not enough to design a piece of silicon and throw it over the wall to an engineer. These chips share a complex workload, and they have to work intimately and well with one another. Additionally, every component in an OC-48/OC-192 design presents engineering challenges. Working with framers might be easier than working with fabrics, but interfacing all the pieces together is not trivial. Even components such as memory present significant challenges because they involve high speed and high throughput rates.
An important question to ask vendors that claim to offer the "whole solution" is: "How can I differentiate my product if I use your solution?" At today's high speeds, complex services such as Layer 7 URL routing don't make sense, so there's little room for differentiation. A vendor might give you a nebulous answer, such as telling you that you can differentiate yourself through software. Perhaps such an answer is really a warning sign that you've got a lot of work ahead of you to pull together the vendor's "solution" into something that actually works.
Software significance
Many vendors at the bleeding edge are struggling simply to get working silicon. Providing software beyond drivers doesn't happen for the first or second generation of parts, which means that this task falls to you. However, if the architecture is too unique, real development tools don't exist, either. The trade-off is between using an old architecture with tools that might not scale to higher speeds and using a new architecture that has room to grow but no tools.
Software has become a more significant part of designs than ever before. And if you have to invest heavily in software, you're going to want to be able to use that software in future generations of products. If so, then make sure the chips you've selected have some legs and a decent (and realistic) road map into the future. Also, evaluate the health of the vendor. A start-up may go out of business and leave you chipless, or a major player may buy the start-up, forcing you to redesign your product according to how the new owner decides to deploy the IP (intellectual property) of the chip. (That is, the intelligence of the chip may be the same, but the interfaces may all change.) Given the rush to buy pieces of the OC-48/OC-192 puzzle, you need to consider this situation as a possible reality.
Climbing the OC speed ladder with as little redesign as possible requires forward thinking. Your first design will require a significant software investment. Thus, software compatibility among IC family members on a road map is critical. Additionally, ICs must be different to support different speeds, but they should keep a similar profile with the same software interfaces.
Software can also play an important role in bridging the gap between generations of products and generations of ICs. The carriers want new products every 12 months, but new chips tend to appear only every 24 months. Use software to make the incremental steps to the next generation while you wait for new hardware to hit.
Is analog really dead?
Getting various chips to work together is probably your hardest design challenge. There are many interfaces to consider in an OC-48/OC-192 design including memory, framer, optical, and fabric. It is also critical that you determine the various interfaces that each vendor supports before you begin specifying parts. For example, you might find the perfect framer that connects to your transceiver of choice, but it interfaces only to network processors that don't interface to the switch fabric you have in mind.
You should also make sure the interfaces between chips within a company, other vendors, or both are guaranteed. This step prevents vendors from pointing fingers at each other and leaving you stuck with two parts that don't work with each other. Seeing such chips working together on a reference design is probably a good sign. If a vendor doesn't include partner chips on a board, be wary and ask why.
Running high-speed interfaces across a pc board—duplex OC-192 is 32 lines running at 622 Mbps—creates some interesting design challenges, to say the least. Such lines are difficult to terminate and control. You simply can't run 18-in. traces; they must be short and precise. Also, several of these streams may relate to the same port: data in, data out, interface to buffer memory, hand off to packet processing/traffic management, backplane transfers, and so on.
High-frequency traces also lead to EMI (electromagnetic interference). If you're working with a 10-Gbps serial stream, the opening in the metal box shielding the transceiver must be very small because the resulting EMI wavelength will be very small. This constraint makes building the connector a challenge. Additionally, SONET (synchronous optical network) has strict specifications (see sidebar "SONET: ubiquitous and agnostic"). A 10-Gbps board might require eight layers with lots of shielding. With these speeds, you're no longer playing in the digital domain. Termination, decoupling, and grounding, as well as interference between signals or crosstalk, all become critical.
There are also physical considerations. For example, you might have a huge number of ports on one side of the box. You then face the challenge of moving all this data across the box to the backplane. And the problems continue across the backplane; noise and inductance become substantial when you've got those 32 high-speed signals blasting across a backplane (see sidebar "Killer serial"). Additionally, vendors tend to manufacture high-performance ICs using processes other than CMOS (see sidebar "CMOS versus the exotics").
Laying out many high-speed signals and maintaining signal integrity is one of the more difficult challenges of OC-48 or OC-192 design. Many chip houses claim to have expertise laying down high-frequency traces, and they'll even give you a reference design to get you started, but you should consider the value of the reference design before you commit to the part. Reference designs are but a single application of the chips you're considering, and, chances are, you need completely difference ingress/egress rates as well as completely difference interfaces. The reference design, then, is of limited value to you (see sidebar " Beyond the reference design").
Toward this end, many vendors supply encrypted models of their chips so that you can simulate crosstalk and noise with your own margins of error. Modeling also helps you determine the "sweet spot" among power, cost, and signal integrity. The art of design is deciding where to beat your margins and where to slack them so that you achieve an overall system margin at the power and cost point you determine for your product (see sidebar "Fudge factors"). A reference design doesn't come close to achieving this goal.
You also have a spectrum of choices for your interfaces. Is it better to run wide and slow (with more traces and at a lower frequency) and take on all the problems associated with high pin counts and thick, parallel buses, or should you run fast and narrow and give up signal integrity and use more power? Is there a happy medium? Also, consider what part of your overall system design you can leverage into the next-generation design. For example, you might be able to design a larger switch that uses the same line cards. Your bus of choice should support both ends of your overall target design, or you will make your own products obsolete.
On the optical side of OC-192, you have the option of reducing the frequency problem by demultiplexing the 10-Gbps stream into four 2.5-Gbps OC-48 streams over the fiber and then regenerating the 10-Gbps signal at the destination. Why might you choose this option? Single-mode fiber carries only one mode of light. This characteristic reduces signal interference, but the spectrum of light must be fine, and the alignment must be tight. Multimode fiber, on the other hand, can carry different modes of light, but, more important, it uses less expensive lasers that are easier to install. Breaking the stream into four lines helps reduce interference between lines, and it can increase the distance you can send a signal with integrity.
|
SONET: ubiquitous and agnostic SONET (synchronous optical network) plays an important role in OC-48. New OC-48 and OC-192 designs tend to use it because it is the established protocol for traffic at this rate at the core. At the WAN (wide-area-network) and MAN (metropolitan-area-network) levels, however, is a major convergence of protocols (read "protocol shouldn't matter"), so a WAN or MAN SONET pipe must be able to handle ATM (asynchronous-transfer-mode) cells, IP (Internet Protocol) packets, or even TDM (time-division-multiplexing) voice. SONET, however, is a physical-layer protocol originally designed to efficiently carry voice traffic, so it is sometimes inefficient when transmitting data. (For example, you may need to split IP packets into several SONET frames with don't-care bits filling in unused bytes.) SONET is a synchronous format that talks in the optical domain, which requires synchronicity, but can map asynchronous traffic into the synchronous domain. However, traffic of any type can run across SONET because it is "protocol-agnostic." Therefore, processing that traffic becomes more difficult, given that any SONET cell could actually be of any format and only part of that data. This property of SONET is important because a carrier may be unable to fill out an entire OC-192 stream itself, and it can choose to lease the unused bandwidth to a variety of traffic. Framing in SONET is well-defined, meaning that a framer can have many features, such as ATM over SONET or IP over SONET. So everything—ATM, IP, and voice—goes on top of SONET. SONET's overhead gives you communication channels for monitoring the system and sending maintenance information in-band; that is, over the data channel. OC-48 is an interesting technology because it rides the cusp between and bridges the electrical and optical worlds. SONET scales down nicely from OC-48. The optical world, on the other hand, only scales up from OC-48. Dense-wavelength-division multiplexing doesn't support rates lower than OC-48.) Thus, optical networks still need access to the smaller pipes, and SONET OC-48 serves this role well. |
|
Killer serial Backplane technology, for its part, is moving from parallel buses having 64 to 128 lines to fast serial links with two to four lines. Parallel buses, such as PCI, don't scale up well, and multiple line cards often share them. Sharing a bus also reduces throughput efficiency because of contention (two nodes trying to use the bus at the same time) and overhead (handshaking to establish a channel between nodes). Some companies are considering fiber for high-speed serial, trying to get away from the parallel buses of copper. "Going serial" marks the movement toward point-to-point communication across the backplane. Besides providing the ability to handle more data, going serial reduces connector size and thus the board space that each interface requires. Additionally, serial links increase the reliability of data transfer and the distance you can send the data, especially if you use fiber. You can also run different clocks over different lines instead of having to synchronize them. With parallel links, you have the problem of many toggling lines as well as skew if the lines have different lengths. The clock over a high-speed serial link is actually embedded in the signal itself, using a SERDES (serializer/deserializer) device. Now you're just sending a signal and recovering the clock, and skew is no longer an issue. Of course, as your data needs grow, you can run several serial links in parallel to increase data rates, starting the whole cycle all over again. Even if your backplane can support OC-48, you may want to consider provisioning the backplane with OC-192 in mind. Later retrofitting the backplane may be unfeasible. High-speed serial is also under consideration for moving data across the board itself. |
|
CMOS versus the exotics It's not worth adding fuel to the politically charged fires between CMOS and "exotic" processes, such as GaAs (gallium arsenide) or SiGe (silicon germanium). CMOS is the Holy Grail of processes, according to vendors that sell CMOS parts. They claim that, although you might get twice the performance today from the exotics, the benefits stop there. CMOS continues to offer improvements through reduced processes and other technological advances. You can achieve the two-times better performance benefit in CMOS by employing a parallel architecture. CMOS' most touted advantage is that you can integrate functions that would be expensive to lay out in GaAs onto CMOS functions and thus reduce chip count. For high-performance systems, however, you should overlook these arguments and see what those "exotic" processes have to offer. In many cases, they provide the only path to low power and high performance. CMOS might wiggle at OC-192 rates, but the question is whether it can hold to the jitter spec. And, although parallel architectures can increase performance parallelism at some point becomes the problem rather than the solution. CMOS also touts the lower power claim. With CMOS, however, the more often the signal switches, the more power the design consumes, and, at high frequencies, CMOS has a power-consumption graph that looks more like a dc drain than "zero" voltage. For high-voltage applications, such as laser drivers, GaAs goes where CMOS simply can't. And prices may drop as the exotic processes ramp to volume in other markets. For example, wireless is the main driver for SiGe, and OC-192 products may gain cost and performance benefits from advances made to push the wireless market. So should you buy CMOS or go exotic? It is probably worth your while to step around the hype and determine which one can give you the performance you need at the price you can afford to absorb. At some point, CMOS will take the market over from the exotics, like it always does, but the question to ask is: "When does it make sense to make the switch?" |
|
Fudge factors Testing your system is difficult. To some degree, board design requires the tools of chip design. You may get only a model of a chip with which to design and test a board. One reason for this limitation is that, at these speeds, accessing signals is difficult, if you can even get a probe to them, given packaging types such as BGA. Another reason is that the chip may still be under design, so a model is all that exists. Additionally, products you design for the future don't exactly match today's usage patterns, so testing a design, never mind determining that you've nailed the right features, becomes a matter of educated guesswork. Repeatability is an important factor in testing because without it you don't know which set of results is right. At these high speeds, measurements such as peak-to-peak and rms are important and tough to make, and the spec isn't very wide. Therefore, 5 or 10% can make quite a difference. Also, little data is available about how a system should handle certain types of traffic or corner cases. Additionally, incorrectly hooking up test-and-measurement equipment to a system can seriously affect results. A lot depends on how you do the impedance matching of components. Most people make test measurements on the optical side rather than the electrical side simply because it is easier. Also, when you measure jitter from an optical point of view, you get a whole physical-layer perspective. Measuring electrically, you see jitter only for the subsystem under test. Integration helps testing in the sense that you can treat the IC like a black box. As long as the chip works to spec, you have less to worry about. If you discover problems, however, it's harder to look inside the chip to figure out what's happening. Integration also reduces the number of pins on devices running high-speed interfaces across a board, driving down power consumption. The advantage of using high-performance parts is that they let you slack off somewhat elsewhere in your design. From a system perspective, you have to meet certain specs, such as power and jitter. You have the advantage, however, of deciding what parts of your system will exceed their fair share of slacking. For example, it might make sense to spend a little more time and money in one part of your system, thereby spending less time and money on the rest of the design. Of course, the challenge is picking the best place to put your effort (your value added), so that you can leverage it in future designs. For example, it does little good to spend lots of time optimizing interfaces between two functions if you know your vendor of choice is going to integrate the two functions onto one chip in the next generation. Reducing time to market can be a gamble. You may want to form a partnership with a vendor to gain earlier access to design specifications. You'll have to start your design based on only a description of the part's functions and then work with changing IC models as the vendor locks in the design. If the vendor meets its schedule, you score big because your board will be ready to drop the chip onto. The risk is that if the vendor is late or the chip doesn't work as expected, you'll quickly lose your time-to-market advantage. |
|
The obvious and the not so obvious The following issues may seem obvious, and thinking about them may perhaps seem unnecessary. At the bleeding edge, however, you get only one crack at getting it right. The same rule applies to the vendors selling products to you. And none of you can make money from empty promises. Market issues Find out whether the vendor has optimized the chip for LANs, WANs (wide-area networks), MANs (metropolitan-are-networks), or whatever network is currently hot. If a vendor claims to have a chip that covers all these applications, it had better be talking about different chips targeting each market. Each market has different price, power, and performance pressures. You'll want the chip set optimized for your space. For example, WAN and MAN markets circuit switch traffic, whereas the LAN packet switches it. Real-world issues Specs can lie as effectively as statistics, especially if you measure the specs under ideal conditions. Many vendors claim to support OC-48/OC-192 rates. Some vendors, however, have measured performance figures while running their devices at looser specs than those used in the real world. Other sneaky tricks include running reference boards on an empty backplane that has no other blades to interfere with shorter signals or using an evaluation board with shorter test beds than those you have to use. Power-consumption figures often look better when the chip is sitting by itself without driving memory or other external loads. Sticky gluelessness On one level, with a glueless interface, you need no FPGA or CPLD shim between two ICs. However, issues beyond the electrical interface at the flow-control level may require significant software work to enable two application-programming interfaces to talk with each other. Given the challenge of writing software, you should consider this translation challenge when you determine just how glueless the interface between two ICs really is. Process contradiction Much of the common wisdom relating to ICs does not apply at the bleeding edge. For example, you might think that ICs designed in a 0.18-µm process necessarily consume less power than those designed in a 0.25-µm process. At this level of complexity, however, design methodology can play a more significant role than process technology for power consumption, thus differentiating between mediocre and superior IC design. Additionally, if a vendor is making a chip in a 0.18-µm process, shrinking the process does not yield great performance and power improvement. The 0.25-µm process provides an easier road map toward improvement. Finally, the type of embedded memory you use affects power consumption. For example, four-cell SRAM is less expensive than six-cell SRAM, but the four-cell memory requires a resistor, so the six-cell actually uses less power. Statistics Statistics play a large role in determining a network's health, identifying problems before they get out of hand and enabling new features. Make sure the chips you select collect the right statistics and give you sufficient real-time access to them; otherwise, it will be difficult to implement functions that relieve congestion and contention, such as back pressure or virtual queues. FPGAs FPGAs can serve an important role in bleeding-edge designs. They are useful for implementing glue logic between interfaces. They also provide an intermediary step between designs and ASICs. At least one IC vendor has released a multiple-FPGA version of its product to serve the market until the device moves to an ASIC in the next generation. Power approach Handling power consumption—and the resulting heat—is a major consideration for a high-frequency design. You can meet NEBS (Network Equipment Building Systems) compliance specs by designing your system according to traditional power ideals and then splitting it across several racks. You can also approach power design by first attempting to design the best system you can and then tackling the problem of figuring out how to cool it down. Power supplies With all the process technologies in use, you might have to support different power supplies depending on the various chips you specify (1.8, 2.5, 3, or 5V, and so on). Multiple voltage regulators consume board space as well as chunks of your power budget. The letter "c" Be wary of the fast-and-loose use of terms. For example, people sometimes confuse OC-48/OC-192 and OC-48c/OC-192c when the terms are "conveniently" mentioned without the "c." The "c" stands for "clear channel," which means that the OC-48c is a single 2.5-Gbps channel: it is not several OC-12s sharing the same pipe. The distinction is important because clear channels are harder to process than mixed channels, because mixed-channel processing is more easily distributed. For example, you can separate an OC-48 line that is really four OC-12 lines into four separate processing cores, each handling one stream. Thus, the length of time each core gets to process the stream increases by a factor of four. In clear-channel streams, the processing cores must process the stream as it comes through, and parallelism becomes a complication, not a simplification. For example, each core would handle a different aspect of processing, with the additional overhead of handing off the packet to the next core. The problem only grows with OC-192c. |
|
Beyond the reference design When you hear that a company has a reference design, find out just what part of the design the reference covers. For example, even though the company may offer "a complete solution"—from transceiver through switch fabric—the reference design may cover only the switch fabric. Several companies have gone on acquisition sprees, buying all the pieces of the puzzle, so that they can tout their "complete solutions." However, just because a company changes a part's logo, it doesn't mean that the part works any better or easier with other parts with the same logo. You need a tightly coupled interface—one that avoids timing discrepancies and skew. It may take a generation or two for the parts to integrate the necessary interfaces to make them actually work together without requiring external logic in an ASIC or FPGA. If this integration hasn't taken place, you have to take care of it yourself. However, if a company is truly bringing together several parts, your overall system design becomes that much easier. Given that the company stays within the specs it claims to meet, you could find yourself with a serious time-to-market advantage, and completing the project will require fewer engineering resources. Reference designs let the vendors tell you how "simple" it is to use their products. However, check whether the design was ever actually built. You might be able to simulate a paper design (debugging), but you won't see it moving traffic (verification). At worst, reference designs offer you a subsystem that you would not put into the kind of system you want to build. In the worst case, the reference design might show you how to efficiently route interfaces, decouple the clock and power supply, and reduce EMI. Reference designs also come in handy in keeping the chip vendor honest. You can run the chip through its paces—and possibly a few tasks it was never "intended" to perform—to see its true performance. You can also determine its real power consumption under specific conditions (that is, the ones you care about) with real-world external loads active. Don't be afraid to push beyond the reference design. You need to find out whether you have to use different board materials or an excessive number of layers to put everything on the board that the vendor didn't include. You might also need to use the vendor's products in different ways than the reference design shows. For example, you might want different low-level drivers to change the way you use a framer. You'll also want to know whether the company has an ASIC bias and wants to push you into building specific versions of its chips just for you. And before you buy, make sure you understand just what kind of marriage you're getting into with the vendor. You may end up having to include the vendor in the intimate levels of your board design just to get things to work. |
Author info
|
|
You can reach Technical Editor Nicholas Cravotta at 1-510-558-8906, fax 1-510-558-8914, e-mail ednnick@pacbell.net.


















