|
||
March 13, 1998WHAT'S HOT IN THE DESIGN COMMUNITYInnovative µP core uses MIPS IV architectureWith its new VR5400 µP-core family employing the MIPS architecture, NEC parlays MIPS licensees' ability to take the basic instruction-set architecture and redesign a core to deliver the desired price/performance point. Although Sandcraft (Sunnyvale, CA), a developer of "chipless"-µP intellectual property, codeveloped the VR5400 with NEC, NEC has exclusive rights to the architecture. Starting with the MIPS IV instruction set and the R5000's cache and MMU, these companies designed a symmetrical, dual-issue, superscalar architecture. Most ALU operations are single-cycle, so the pipelines typically do not interlock and stall. However, when the instruction issue unit pairs a single-cycle instruction with a longer instruction, a pipeline stall may occur. But if the next pair of instructions is also a single-cycle/long-instruction combination, the VR5400 dynamically swaps the instructions and avoids interlocking. Each pipeline has a local bypass that allows information to bypass the write-back stage, feeding data directly back into the pipeline and preventing pipeline locking. The VR5400 also supports global bypassing that would allow the two pipelines to exchange results to help minimize the effects of data dependencies. Unlike the superscalar R5000, which has separate integer and floating-point pipelines, the VR5400's pipelines can handle either integer or floating-point instructions. This approach means that the core can execute any combination of integer and floating-point instructions: integer-integer, integer-floating, and floating-floating. To enable each pipeline to handle both integer and floating-point operations, NEC split floating-point operations: The mantissa goes through the integer portion of the pipe, and the exponent goes through a separate 12-bit ALU. The VR-5400 complies with the IEEE-754 floating-point format. Within the two pipelines, the VR5400 has two unified integer/floating-point units, a nonblocking load/store unit, a 32×32-bit integer/floating-point multiply unit with a 64-bit accumulator, a vector unit that supports an 8×8-byte SIMD (single-instruction multiple-data), and a branch unit. Each integer/floating-point unit contains a 64-bit barrel shifter that can perform single-cycle left or right rotation of 32 or 64 bits. This feature is useful for data alignment in graphics and printer applications. NEC also adds a set of rotation instructions to support the barrel shifting. The load/store unit allows pipeline flow to continue when no data dependencies exist. Through NEC's addition of a data-prefetch instruction, the load/store unit can prefetch data to fill the cache without affecting the pipeline. (That is, no pipeline locking occurs.) Also, to minimize data dependencies, the VR5400 can dynamically swap instructions among pipelines so that consecutive instructions can flow down the same path. The multiply unit can perform continuous 32×32-bit, single-cycle multiplies and multiply-accumulates without stalling the pipeline; however, a 64×64-bit multiply may stall the pipeline. The vector unit uses 64-bit registers that are shared floating-point registers, so the device cannot simultaneously perform floating-point and vector operations; in most applications, this constraint is not an issue. You have to flush the pipeline when switching between floating-point and vector operations, but you need not perform a time-consuming context switch. To reduce code size and in-crease performance, the VR5400 adds register-based multiply instructions; this addition allows the CPU to write the multiply result to a register file instead of to special internal registers, as the standard R5000 does. The VR5400's cache structure, similar to the R5000, comprises a 32-kbyte instruction cache and a 32-kbyte data cache. Both are two-way set-associative and use a least recently used replacement algorithm. The caches use a 32-byte line. You can lock the caches on a per-line basis; this approach benefits time-critical instruction loops, interrupt handlers, and data. For example, you may use line locking to lock critical data structures, such as programming stack and global variables as information passes between subroutines. Although a normal load from cache takes only one cycle, filling a cache line takes eight cycles, even with zero-wait-state memory. The data cache also supports write-back and -through cache protocols. The bus protocol supports four outstanding reads for concurrent instruction- and data-cache refills and supports split reads with write-back or store operations. The VR5400 collects consecutive uncached word-write operations to form a single block-write operation; this approach is the same as data merging, except that the collection approach requires you to align the data on a line boundary. The VR5400 can also perform burst reads as large as the cache line. The VR5400's MMU supports ad-dress translation, facilitates exception processing, and manages operating modes. It is compatible with the R5000's MMU. It supports 36-bit physical and 64-bit virtual addresses with variable page size from 4 kbytes to 16 Mbytes. It has a 48 dual-entry translation-look-aside buffer, which maps into 96 pages. NEC offers a companion $35 (10,000) chip set that supports PCI, memory control, a DMA controller, an interrupt controller, a timer, and serial and parallel ports. NEC also offers a $1000 evaluation board for the VR5464. Hewlett-Packard (Colorado Springs, CO) offers logic-analyzer support using the N-Wire debugging port of the VR5400. N-Wire provides access to internal states and offers execution control. You can use the logic analyzer to set multiple hardware breakpoints on instruction and data addresses or on data values. Cygnus (San Jose, CA, www.cygnus.com), Green Hills (Santa Barbara, CA, www.ghs.com), and Apogee (Campbell, CA, www.apogee. com) provide compiler support for the VR5400. Wind River (Alameda, CA, www.windriver.com) and Integrated Systems (Sunnyvale, CA, www.isi.com) offer RTOS support. The first two devices in the VR5400 family are the VR5432 and VR5464. The VR5432 has a 32-bit system interface, runs at 167 MHz, comes in a 208-pin PQFP, and costs $45 (10,000). The VR5464 has a 64-bit system interface and comes in a 272-pin advanced-BGA (ABGA) package, a multilayer technology that allows more complex connectivity between the die and the pins. The VR5464 comes in 200- and 250-MHz versions and costs $70 and $95, respectively. Both devices operate at 2.5V internally with 3.3V I/O. At 250 MHz, the VR5464 consumes 4.5W and has SPECint95 and SPECfp95 ratings of 10 and 5, respectively. NEC also offers the VR5400 as a core for integration into a custom product. --by Markus Levy NEC Electronics Inc, Santa Clara, CA. 1-800-366-9782, www.nec.com. Make way for the graphics GoliathIntel's long-awaited Intel740 graphics chip is now available for sampling. The company developed the chip with Real3D (Orlando, FL), a former division of Lockheed Martin. The Intel740 implements the full 533-Mbyte/sec peak bandwidth AGP 2x bus, including sideband signaling. Deep request buffering, direct main-memory execution, plus the ability to simultaneously access local frame-buffer memory all reduce the AGP latency effects of multiple system operations contending for core-logic attention. The Intel740 combines 2-D, 3-D, and video support along with an integrated 220-MHz RAMDAC in one chip and targets the high end of the $1500 to $2500 desktop-PC market. Intel claims that the chip can sustain 425,000- to 500,000-triangle/sec and 45 million- to 50 million-pixel/sec performance. Thus, the company contends that the device outperforms first-generation, high-end, 3-D-only accelerators. Per-pixel interpolation enhances quality, and the 64-bit, 3-D parallel-processing engine boosts performance. Local frame-buffer-memory options include 100-MHz synchronous DRAM or synchronous graphics RAM in 2- to 8-Mbyte densities, although the Intel740 does not support storing texture-map data in the frame buffer. Other key pieces of the company's platform include the 440LX AGP chip set and Pentium II processor, which handles 3-D, front-end-geometry operations. One surprising feature, given the requirement to store texture maps only in main memory, is the small, 256-byte on-chip texture cache. Intel claims that this size is adequate because of the fast AGP interfaces and tiled addressing mode, which exploits pixel spatial locality. Brooktree/Rockwell's (Santa Clara, CA) Bt829 and Bt869 complete the video-in and -out capability, Hauppauge Computer Works (Hauppauge, NY) supplies a TV-tuner chip, and C-Cube (Milpitas, CA) and Zoran (Santa Clara, CA) handle hardware digital-versatile-disk (DVD) support. Intel's benchmarking indicates that under "average" loading conditions, a 266-MHz Pentium II processor, along with Zoran's software DVD decoder, can smoothly render 24-frame/sec video at 24-bit color and 720×480-pixel resolution. The Intel740 comes in a 468-bump BGA, and Intel builds the 3.3 million-transistor chip on its mainstream 0.35-µm process. For graphics add-in cards, Real3D developed an AGP-to-PCI adapter chip that also supports texture-map storage in additional local memory. Intel-supplied Direct3D and Open-GL drivers let you use the Intel740 in Windows 95 and Windows NT 4.0 environments. For Windows 95, Intel developed the AGP support that Microsoft bundles in OSR 2.1. Intel includes NT 4.0 AGP software with the graphics drivers. Although you cannot ignore Intel's numerous abilities to influence the Intel740's fortunes, the success of this chip is not guaranteed. Workstation users and game players may still be willing to pay more for the slight performance edge of a dedicated 3-D accelerator with on-chip rendering. (Speed-seeking software developers often write directly to the hardware instead of to an application-programming interface.) The Intel740 costs $34.75 (10,000)--probably too expensive for the fast-growing market for PCs costing less than $1000, and business PC applications for 3-D graphics have not yet emerged. Intel's upcoming Katmai processor with its multimedia-extension-enhanced floating-point performance also threatens to further reduce graphics-chip-set prices and required features. At this year's Intel Developers Forum, Senior Vice President Albert Yu alluded to upcoming low-end P6-generation CPUs with integrated graphics controllers. --by Brian Dipert Intel Corp, Folsom, CA. 1-916-356-8080, fax 1-916-356-6227, www.intel.com. Keep your LCD bright with boost/inverter bias supplyLCD screens require a relatively high-voltage bias supply, which your system must usually derive from a low-voltage source. Maxim's MAX686 boost dc/dc converter can deliver as much as 27.5V or 27.5V (at 10 mA) from a 2.7 to 5.5V supply. The 16-pin QSOP device uses a current-limited, PFM-control technique, switching as fast as 300 kHz, to achieve efficiencies as high as 93%. You need not add an external switch because the IC includes an internal N-channel MOSFET. To set the output voltage, you control a 6-bit, built-in DAC using up/down control lines. Typical quiescent current is 65 µA, and current consumption in shutdown mode is 1.5 µA for this $2.95 (1000) IC. --by Bill Schweber Maxim Integrated Products, Sunnyvale, CA. 1-408-737-7600, fax 1-408-737-7194, www.maxim-ic.com. New 2.5V logic family debuts with 16-bit devicesThe 2.5V VCX logic family is the latest addition to Pericom Semiconductor's line of SiliconInterface products, which includes the FCT3, LPT, LCX, and ALVCH logic families. The VCX family supports low-voltage µPs and synchronous DRAMS that are moving to 2.5V power supplies. Pericom's VCX ICs feature a patented 3.6V I/O tolerance for use in mixed 1.8 to 3.3V systems; 5V-tolerant parts are also available. Devices in the VCX family also have a three-state balanced output drive of 24 and 24 mA and a static power consumption of 20 µA. The typical propagation delay is less than 2.5 nsec, and the VCX family also includes edge-rate control to reduce ground bounce and ring-back. All VCX products are available in 48-pin TSSOPs. The VCX16244A/16245A costs $2.05 (1000), and the VCX-16373A/16374A costs $2.15 (1000). --by Stephen Kempainen Pericom Semiconductor, San Jose, CA. 1-408-435-0800, fax 1-408-435-1100, www.pericom.com. Lossless compression core hits 100 Mbytes/secSome computer architects have long viewed lossless data-compression technologies as the answer to incessantly growing demand for more storage and higher transfer rates. Unfortunately, except for modems and tape drives, the use of data compression has been limited to software-based schemes that maximize disk capacity. Theoretically, however, you could embed a compression IC with the controller in a disk drive and--without help from an operating system--increase storage capacity and data rate by a factor of 2, 4, or possibly 8. Taking the technology a step further, a data-compression engine that operates between a µP and main memory could offer similar advantages in effective memory capacity and data rate. BTG USA claims to have developed a compression core that can fit in these roles as well as in network switches, set-top boxes, and other media-rich applications. Several factors--starting with throughput and latency--have conspired to limit the use of hardware-based data compression. However, BTG's X-Match processor, which Simon Jones, PhD, developed at Loughborough University (Loughborough, UK, www.lboro.ac.uk), delivers a 100-Mbyte/sec stream through the compression engine and a 140-Mbyte/sec stream through the decompression engine. Moreover, 4-byte words pass through the compression processor with a latency of two µP clock cycles. Leading disk-drive companies have also claimed that hardware compression in a disk drive requires operating-system support for variable-sized drives. Theoretically, however, drive designers could embed the compression technology using conservative compression ratios and hide the technology from the host CPU. In fact, compression could allow drive vendors to offer relatively larger disk drives and higher data rates without stressing the read channel and head/ media interface that typically limit drive designs. The high data rates of the X-Match algorithm result from a dictionary-style compression engine that simultaneously operates on 4 bytes. The wide words re-duce the number of matches in a data stream but result in a compression ratio of 33-to-3 bits when a match occurs. To further boost efficiency, the engine can also achieve compression when partial matches occur and can dynamically reorder its dictionary based on real-time statistics, thereby maximizing compression ratios. BTG currently offers an X-Match Evaluation Kit that includes an FPGA implementing the compression processor. The kit also in-cludes some sample applications and technical support. For designers who want to integrate the X-Match engine into an OEM product or to design an X-Match IC for resale, BTG licenses the VHDL source code starting at $35,000 and negotiates the actual price and royalties on an application-by-application basis. --by Maury Wright BTG USA Inc, Gulph Mills, PA. 1-610-278-1660, www.btg-et.com. MMX single-board computer measures only 5.75×8 in.Aaeon Technology's single-board computer targets designers seeking the features of a full-blown PC in a small package for embedded applications. The PCM-5894 supports a range of processors, including Intel's (Santa Clara, CA) Pentium CPUs up to the 233-MHz P55C with multimedia-extension (MMX) instructions, AMD's (Sunnyvale, CA) K5/K6, and Cyrix's (Richardson, TX) M1/M2. Two onboard 72-pin SIMM sockets house as much as 128 Mbytes of system memory, and a video accelerator interfaces CRTs or flat-panel displays, including 36-bit thin-film-transistor LCDs. M-Systems' (Newark, CA) DiskOnChip flash disk provides as much as 72 Mbytes of read/write storage and system-boot-up capability in a 32-pin DIP package. Communications circuitry includes three RS-232C and one RS-232C/422/485 ports (four 16C550 UARTs), two USB connectors, and 100BaseT Ethernet. A floppy-disk and enhanced-IDE controller, a multimode parallel port, and keyboard/mouse interfaces round out the board's I/O capability. The PCM-5894 includes both a PC/104 connector for 16-bit bus expansion and a PCI-slot connector. The system requires 5V at 10A, depending on the CPU, and 12V. The PCM-5894 single-board computer costs $450 (one to nine). --by Warren Webb Aaeon Technology Inc, Hazlet, NJ. 1-732-203-9300, fax 1-732-203-9311, www.aaeon.com. GaAs devices hurdle some obstacles
|
||
| EDN Access | Feedback | Table of Contents | |
||
| Copyright © 1997 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc. | ||