
Designers of deep-submicron ASICs face formidable challenges as they attempt to cram millions of gates onto a single silicon die and then run the resulting system on a chip at high clock speeds without chip burnout. At deep-submicron geometries of 0.5 µm and below, problems result from the congested die, small-device physics, and-almost always-designers' need or desire for chips to run faster and use less power.
Because of the challenges that occur at the deep-submicron level, ASIC designers must work closely with ASIC vendors and electronic-design-auto-mation (EDA)-tool companies. ASIC vendors offer macrofunction and Spice-model libraries; EDA-tool vendors offer floor-planning, synthesis, and timing-analysis tools. These design aids help ensure the same level of success for 0.5-µm and smaller ASICs as for 1-µm and larger ASICs. Not using these design-planning and -modeling aids can add time and cost to deep-submicron designs.Table 1 lists representative deep-submicron ASICs whose vendors offer libraries of large macro functions. Four of the vendors employ geometries that have less-than-0.5-µm gate lengths. Because of the small size of these devices, ASIC vendors must use 3V and lower voltage cores to prevent oxide breakdown.
Although there is no official threshold for what constitutes a "deep-submicron ASIC," the term generally refers to a CMOS device whose minimum gate length (L)-the distance between the source and the drain of the FET-is 0.5 µm or smaller. Manufacturers measure "effective" gate length (L-effective) as the equivalent gate length in the circuit and "drawn" gate length, usually slightly more than L-effective, as the length of the gate on the drafting drawing.
At the deep-submicron level, interconnect delays dominate over gate delays. The intrinsic gate delay (the time the gate spends switching) for 0.35-µm designs is around 100 psec, whereas the potential estimated delay for a 2-mm interconnect wire can be as high as 600 psec. This fact not only alters ASIC designs, but also affects the EDA tools for these designs. Logic-synthesis tools for 1-µm and larger designs synthesize gates and connect them with wire. Logic-synthesis tools for 0.5-µm and smaller designs synthesize the interconnect and then hang some gates onto it so that the wires correctly perform the logic function.
Normally, you think of wires as perfect conductors having negligible RLC. However, resistance is proportional to the wire's length divided by the wire's width. The shrinking geometries of deep-submicron designs mean thinner metal interconnect wire and thus higher resistance per unit length of the wire. The smaller wire footprint also means lower capacitance per length of wire.
The traditional way to model interconnect capacitance is to use the lumped-capacitance model, which ignores the wire's resistance. This model assumes that all points on the interconnect wire charge at the same rate and that all gates tied to the interconnect have their inputs change at the same time. The alternative to the lumped-capacitive model is to use an approach that treats the interconnect as a tree of distributed resistors and capacitors, with separate delays calculated for the end of each branch of the tree (Fig 1). This interconnection approach can provide Spice models that predict within 10% accuracy-far better than that of the lumped-capacitance model-how the actual chip will perform.
At the deep-submicron level, a logic gate's time delay is proportional to the rise and fall times of the signal that drives the logic gate. The slope of the input signal significantly impacts the switching speed of the gate. A gate's time delay is effectively a nonlinear function of the slope of the rise and fall times and the output-load capacitance. Therefore, EDA-synthesis tools require a nonlinear delay calculator. One method of providing this calculation is to use a look-up table. The look-up table uses the rise time and a gate's output capacitance to calculate an accurate delay (Fig 2). Other methods use piecewise linear approximations or polynomial equations.
IBM's CMOS 5S ASIC, which features a 0.36-µm technology, features 1.6 million usable gates on a single die. The device builds upon and is 20 to 25% faster than the company's 0.5-µm CMOS 5L. To achieve the high number of usable gates, the CMOS 5S uses as many as six metal wiring layers-five for global wiring and one as a local interconnect level inside each gate. The many layers of metal reduce the interconnect length and, consequently, its path delay. This approach lets you integrate megafunctions, such as µPs and communication cores, on the chip. Even with the small dice, the product's packaging still includes as many as 736 I/O pads.
IBM also supplies an ASIC tool kit for developing deep-submicron devices. Key features of the tool kit include: Einstimer, a static timing tool that uses delay models created with IBM's Delay Calculation Language; Test Structure Verification, a tool that enables designers to achieve test patterns with greater than 99% coverage; and SimuN, a tool to aid ASIC designers in the analysis of I/O characteristics, including noise and simultaneous-switching effects.
Toshiba's fabricates its TC200 series of gate arrays, embedded arrays, and standard cells using a 0.4-µm drawn gate length. The TC200 family, which includes the TC200G with 700,000 usable gates, implements a 62-µm, inner-lead, tape-automated bonding. The family uses EDA tools from Cadence Design Systems. These tools include Cadence's Verilog-XL, which Toshiba uses as an ASIC sign-off simulator providing more accurate delay models. This model is a nonlinear function of the slew rate and output loading of a cell. The models perform eight times as many Spice simulations as previous-generation models performed.
In concert with Toshiba's plan for an open EDA-tool strategy, the company has selected Compass Design Automation's ChipPlanner as a floor planner. Floor planners are necessary at the deep-submicron level because they let designers know where each gate or block will reside on the chip and, thus, how long each interconnect wire will be. Floor planners achieve this task by hierarchically determining where all the blocks on the chip should reside and then estimating the wire lengths required to connect each node. Some floor planners can predict within 10% the actual metal interconnect.
One of the densest deep-submicron ASICs, NEC's CMOS-9 family, boasts a 2-million-gate CMOS gate array using a 0.35-µm drawn gate length (0.27-µm L-effective). The family handles clock speeds as fast as 155 MHz. NEC has used the technology internally for over six months, proving the product's process. The 3.3V gate arrays use two- and three-layer metal interconnects. The CMOS-9 includes base arrays ranging from 190,000 to 2 million gates. The company plans to offer a library version characterized at 2.5V.
In addition, the CMOS-9 series handles the standard I/O and macrofunctions. High-speed I/O functions include support for the Peripheral Component Interconnect (PCI) bus, including 66-MHz PCI; interfacing to 5V logic without damaging the 3.3V ASIC using protected 5V I/O; and a family of standard and slew-rate-controlled low-voltage-TTL I/O levels. In addition, the company's OpenCAD Design System supports the gate-array family. OpenCAD combines both proprietary and popular third-party CAE tools, such as those from Cadence and Synopsys. NEC uses its own proprietary floor planner.
Furthermore, CMOS-9 supports high-speed system design with such features as a digital PLL chip-to-chip clock skew and clock-tree synthesis to minimize on-chip clock skew. NEC also offers RAM as a soft macro.
LOOKING AHEADHowever, changing a fabrication facility to achieve lower geometries is a financial decision. Putting a new fabrication facility in place costs $500 million to $1 billion. Many vendors would rather reap the benefits from existing facilities before investing in new ones. Nevertheless, both AT&T and Hitachi plan to have facilities for 0.35-µm processes in operation by year-end, and IBM has approved a facility for fabricating 0.25-µm, higher density arrays operating at 2.5V. These low-voltage ASICs are increasingly interfacing to low-level interfaces, such as Gunning transceiver logic and low-voltage TTL. Packaging is also an issue for these high-density, low-voltage gate arrays. Ball- arrays are the most popular, but ceramic pin- arrays, PQFPs, ceramic flat packages, tape-automated bonding, and PLCCs-all with myriad pin counts-are also vying for position. The choice will depend on cost and thermal design.
|
In power, as in everything else in deep-submicron design, the alternative to accurate power estimation is worst-case design. Worst-case power estimation forces you to make architectural and logic changes to minimize power consumption. Such changes include adding built-in power-down circuitry to switch off all or part of the chip and designing logic using dedicated buses to minimize signal transitions (glitches), even when a design is synchronous. These changes typically cause the circuitry to be bigger, slower, and more complicated, however.
The densest deep-submicron ASIC is a recently announced cell-based CMOS ASIC from Hitachi America and VLSI Technology Inc. The two companies collaborated to develop a 5-million-gate standard-cell array. The chip employs a 0.35-µm L-effective (0.4-µm drawn) gate length, and one gate measures approximately 800 mils per side. The chip employs two to five layers of metalization to minimize interconnect delays, features a 1.4-µm routing metal pitch, and operates from 2.2 to 3.6V.
Both Hitachi and VLSI plan to act as sources for the chip, which has a PLL that can multiply external frequencies to more than 250 MHz for internal operation. The gates boast a less-than-1-µW/MHz/gate power factor. The device comes in PQFPs and plastic and taped ball-array packages. The taped ball- array has as many as 672 pins, which the companies plan to increase to more than 1000 pins.
Other companies are also moving toward higher densities and smaller processes. But putting these processes into place involves changing existing facilities or putting new ones into place-a costly proposition. Vendors also must carefully weigh packaging for these devices (see box, "Looking ahead"). These issues will affect how small processes can get and how dense chips can become.

| AT&T Microelectronics Allentown, PA (800) 372-2447 | Cadence Design Systems San Jose, CA (408) 943-1234 | Compass Design Automation San Jose, CA (408) 433-4880 |
| Fujitsu Microelectronics San Jose, CA (408) 922-9000 | Hitachi America Inc Brisbane, CA (800) 285-1601 | IBM Microelectronics Essex Junction, VT (800) 769-2742 |
| LSI Logic Corp Mipitas, CA (408) 433-8000 | Mitsubishi Electronic Device Group Sunnyvale, CA (408) 730-5900 | NEC Electronics Inc Mountain View, CA (415) 965-6000 |
| Synopsys Inc Mountain View, CA (415) 962-5000 | Toshiba America Electronics Components Inc Irvine, CA (714) 455-2000 | VLSI Technology Inc San Jose, CA (408) 434-3000 |
| Vendor | Part No. | Gate length (µm) | Gate density | Gate delay | Power dissipation (µW/MHz/gate) | Price | Features |
|---|---|---|---|---|---|---|---|
| AT&T Microelectronics | HL400C | 0.5 drawn | More than 500,000 | 90 psec, unloaded typical | 0.8 F/O=1 | N/A | N/A Three-layer metal interconnects; 600-MHz max toggle frequency; 40-mA buffer drive; less-than 4-mil pad pitch; 700-kbit, compilable SRAM; cap-operates from 2.7 to 3V; 3/5V I/O ability TTL/CMOS, PCI, positive ECL, Gunning transceiver logic/transceiver logic, high-speed transceiver logic |
| Fujitsu | CG51/CE51 series; | 0.5 drawn | 34,000 to 754,000 | 210 psec at F/O of 2, L=1 mm | 1.2 | $27.80 (1000) (214,000 gates in 208-pin PQFP) | Triple-layer metal, 600-MHz maxtoggle frequency, supports 3.3 and 5V I/Os, RAM compiler handles single/ dual/triple-port RAMs, PLL and clock network for skew control, JTAG boundary scan, 24-mA drive capability, 496 I/O pads |
| Hitachi America Ltd | HG72G/E | 0.5 drawn | 39,000 to 500,000 | 200 typical | 1.2 | From $70,000 (NRE charge) | System clock speed greater than 100 MHz, ball- array with 672 Circle No. 374 I/O pins, on-chip PLL, maximum memory of 256 kbits |
| IBM Microelectronics | CMOS 5S | 0.36 L-effective | 1.6 million | 115 psec at F/O of 2 and L= 0.5 mm, 180 psec at F/O of 2 and 2 mm | N/A | N/A | Six-layer metal, five for global wiring and one for local interconnect inside each gate; 20 to 25% faster than IBM's 0.5-µm CMOS 5L series; maximum of 736 pads |
| LSI Logic | 500K series | 0.5 drawn and 0.38 L-effective | 1.5 million usable gates | 85 psec, unloaded, typical | 1 | N/A | Two-, three-, and four-layer metal inter-connection; gate- and embedded-array and cell-based architectures; PLL synchronizes internal architecture to within 300 psec of external IC reference; as much as 2 Mbits of cell-based embedded RAM |
| Mitsubishi Electronic Device Group | Ultra Performance series | 0.5 drawn | 700,000 | 145 psec at F/O of 2 | 1.3 | N/A | Four speed/power options; 2kx20-bit embedded memory cells; low-voltage CMOS, low-voltage TTL, positive ECL, and PCI I/O buffers; 3- and 5V-tolerant I/O buffers; PLL and low-clock-skew distribution; gate arrays and standard cells |
| NEC Electronics Inc | CMOS-9 | 0.35 drawn | 2 million | 119 psec at F/O of 2 and L=0.5 mm | 0.9 | $336 (10,000) (318,000 gates) | Two- and three-layer metal, clock speeds as high as 155 MHz; maximum of 1200 I/O pads using tape-automated- bonding technology; GTL, HSTL, PCI bus interface; positive-ECL and slew-rate control I/O; digital PLL and tree synthesis to minimize skew; JTAG and built-in self-test; 3 and 5V I/O |
| Toshiba America Electronics Components Inc | TC200 series of gate arrays and standard cells | 0.4 drawn | 700,000 | 190 typical | 1.27 for gate arrays and 0.77 for standard cells | $85 (20,000) | 62-µm, inner-lead, tape-automated (200,000 gates bonding; available with Verilog-XL in 240-pin sign-off flatpack) |
| VLSI/Hitachi | CMOS cell-based gates | 0.4 drawn, 0.35 L-effective | As many as 5 million raw gates | 15 psec, loaded | Less than 1 | N/A | Two- to five-layer metal interconnects; stacked vias between metal layers; operates from 2.2 to 3.6V; cell-based design; employs a 1.4-µm routing metal pitch; effective PLL for multiplying frequencies to 250 MHz; 672-pin, tape ball- arrays |