EDN Access

 

December 4, 1997


EDN Hands-on project:
Unveiling the hidden secrets of PC-bus architectures

Markus Levy, Technical Editor

Theoretically, the new PC architectures and buses provide a system with lots of bandwidth to transfer data and feed the CPU. To explore the reality behind the theory, a group of PC experts and I set out to test these architectures. Find out what we discovered about Slot 1, AGP, PCI, USB, and more.

PC architecture is reaching a new frontier, and Intel--with its band of fab workers in bunny suits--is leading the move. The PentiumPro initiated the first and most significant architectural change with its separate Level 2 cache and main-memory bus structure. Intel was simultaneously working on the Advanced Graphics Port (AGP) that would potentially promote higher performance graphics while freeing PCI bandwidth for emerging applications, such as digital video disk (DVD) and videoconferencing. Theoretically, these architectural changes offer huge benefits to a PC's performance. But theory and reality are often at opposite ends of the spectrum.

25ho1The bottom line is that most PC architectural changes come about to satisfy the voracious appetite of the CPU for code and data. Although the increasing size of Level 1 caches solves part of the data-throughput problem, faster buses are critical to keeping the beast fed. This area is where AGP and the split-bus architecture come into play, and 75-, 83-, and 100-MHz main-memory buses become critical. You also have to consider the peripheral buses, such as PCI, USB, and IEEE 1394, all of which have an effect on overall system performance. So, for this hands-on project, I set out to determine whether these architectural changes and faster buses add practical value or are merely a way for Intel and other vendors to market their PC products (Figure 1).

The goal of this project is neither to pit one company's CPU, graphics card, or disk drive against another's, nor to provide complete benchmark information about the products. The primary goal is to analyze some of the PC architectural choices and provide an understanding of AGP vs PCI graphics and Socket 7 vs Slot 1 bus structures. I also got a feel for how some PCI and USB peripherals affect system performance.

To begin this project, I collected PC components from many companies (see box "For more information..."). I then assembled a huge group of PC experts (see box "Men at Work").

Initial roadblocks

Although I started planning for this project almost a year ago, the real work didn't begin until August. I flew to San Diego, home of Anchor Chips, where President and CEO Ron Sartore and his crew worked with me for a week as we assembled our state-of-the-art computer systems. Initially, we had an Intel system based on Intel's new 440LX chip set and an Acer Labs reference design based on Acer's Aladdin 4+ chip set. For the operating system, we used Microsoft's OSR2.1, code-named "Detroit"; this OS is essentially an enhanced version of Windows 95 and supports AGP and USB.

For the most part, the Acer Laboratories Inc (ALI) PC platform functioned flawlessly, even when running the system bus at 75 MHz. The ALI PC platform supports USB, but the system we used has an eight-pin header instead of a real USB connector. We could find no commercially available connector to attach to this board and had to build a makeshift connector to get USB support (which later proved to generate USB CRC errors). The lack of a connector was somewhat surprising. It was also unfortunate because the ALI PC platform had an open-host-controller-interface USB controller, which we were hoping to compare with Intel's universal-host-controller-interface controller. We also discovered that the layout of ALI's board was somewhat problematic. Unfortunately, three of the four PCI connectors were too close to the CPU socket. With the CPU fan installed, the board would accommodate only PCI cards shorter than 7 in. This limitation ruled out using many of our test cards, including HP's E2920 PCI-bus analyzer and Anchor Chips' CO-MEM PCI-interface demonstration card.

Intel's Atlanta board also came with its share of problems. After the Anchor Chips crew and I spent three days attempting to get this board running without crashing every 5 minutes, we learned from Intel that we had received a preproduction version of the board. Intel immediately swapped boards with us, and the new board has worked flawlessly.

Software installation

To begin the software installation, we booted DOS from a floppy disk and then created four equally sized partitions on our Western Digital Caviar 5.1-Gbyte hard drive. We used one of the partitions to store all the data on the CD-ROMs that we used throughout the project. This approach proved quicker than using the CD-ROM drive for installing applications, drivers, and operating systems.

From the DOS prompt, we performed a basic OSR2 installation on the C: partition; by "basic," I mean we used a standard VGA driver and no DMA support for the hard disk, for example. Next, we used the DOS FDISK command to switch primary partitions. You can also use PowerQuest's Partition Magic product to create, switch, and copy partitions. After booting from DOS again, we installed OSR2 on the new primary partition. From here, we loaded Windows and used Explorer to copy all the files from the original C: partition to a subdirectory on the storage partition. We set the options in Explorer to allow us to view hidden files, because we had to copy those to that subdirectory. The benefit of all this work was that we could erase our C: partition and reinstall OSR2 simply by copying all the files in the subdirectory to the erased partition. This approach let us avoid reinstalling the operating system whenever we ran a benchmark and wanted to clean up the partition. We had to repeat these steps and install OSR2 after we moved the drive to another PC platform.

The next step was to install the USB supplement, which turns OSR2 into OSR2.1 and enables USB and AGP support. I obtained this supplement from Intel. I also went to Microsoft's Web site and downloaded its DirectX 5 driver, which is necessary to run some 3-D applic ations and benchmarks. I ran most of the benchmarks in this project with the DMA mode switched on for the hard drive. You can switch to this mode by going into system properties in the control panel and clicking on device manager, then disk drives, and then DMA. Note that Windows 95 lacks this capability.

Analyzing USB performance

After we got most of the PC platforms and peripherals functioning, we shipped ourselves and the PC equipment to Hewlett-Packard's facility in Colorado Springs, CO. On Aug 25, our group of PC experts began looking at the performance of USB peripherals. We used FuturePlus' USB analyzer with HP's 16500C logic analyzer. The USB analyzer includes a preprocessor that translates data sequences into meaningful information packets. For example, when you plug in a USB device and the enumeration process, in which the host determines the features of the installed USB device, begins, the preprocessor generates a "get-descriptor" header containing the neatly organized configuration data. From that point on, the preprocessor indicates the start of each USB frame and sends an acknowledgment when the frame completes. The USB analyzer also offers some advanced-triggering capabilities that allowed us to trigger data capture when a data pattern appeared in the USB data stream or when a USB transfer type occurred.

The first device we checked out was a loop-back test tool from Intel. This tool, a 930Hx bus-powered hub with an embedded function, allows you to perform loop-back testing on bulk endpoints (addressable sources or sinks of data). We used it to ensure that the USB ports on the Intel and Acer boards were functioning properly and to generate USB-bandwidth consumption. The tool has three bulk-endpoint pairs. The endpoint address includes an endpoint number and a direction. One pair has a packet size of 64 bytes, the maximum packet size for a bulk endpoint. The packet size of an endpoint determines how many bytes of data that endpoint can transmit or receive in a USB transfer. The other two pairs of bulk endpoints have 8-byte packets.

Unfortunately, we discovered that the tool is not an ideal bandwidth consumer, because its firmware set up the bulk endpoints in in/out pairs. When the host sends data to the device via an out endpoint, the 930Hx copies the data from the receiving FIFO buffer to RAM and then from RAM to the corresponding in endpoint's transmitting FIFO buffer. The loop-back application running on the host would send pattern data to the out endpoint and then immediately read and verify the same data from the in endpoint. Transferring data from the 930Hx's RAM to the transmitting FIFO buffer is time-consuming; therefore, the 930Hx could provide no data at the host-requested rate. When the host re-quests data from a USB de-vice having no data available, the device sends a not-acknowledge (NAK) packet to the host. The host then reschedules the re-quest, sometimes immediately. The host request, followed by a device NAK, continues until the device has data available, and the device then transmits the data. The Intel loop-back tool demonstrated a lot of device-NAK activity.

The 930Hx operates in double-buffer mode, so it immediately services the first two bulk ins and outs, but the device generates several NAK signals on the third transfer before sending the data. Anchor Chips' software engineer Mark McCoy speculated that this delay occurred because the device loops back the data that it sends out. Looping back requires the device to read the entire in FIFO buffer and write all those bytes to the out FIFO buffer.

McCoy also discovered another limitation of the tool: It supports a maximum transfer size of only 512 bytes, which translates to eight data packets for a 64-byte maximum-packet-size endpoint. (Windows allows USB transfers as large as 64 kbytes.) Once the 512-byte transfer completes, the host application and device driver must send a new transfer request to the driver. The USB analyzer indicates that this transfer request has a turnaround time of 3 to 4 msec, or three or four USB frames.

To simulate a typical output device that creates burst bulk outputs, the group used Anchor Chips' AN2131Q EZ-USB device. This device contains a USB transceiver, a serial-interface engine, an enhanced 8051 core, endpoint buffer memories, and 8 kbytes of code and data RAM. The EZ-USB device uses the USB in a novel way for downloading the device's operating firmware. (Check out photos and a detailed description of the EZ-USB's "renumeration" on EDN's Web site.) The EZ-USB performed bulk-out transfers in 512-byte chunks as fast as the host could submit them. This capability resulted in a sustained transfer rate of about 164 kbytes/sec. Each 512-byte transfer could easily complete in the time of one USB frame. The three- or four-USB-frame delay caused the 164-kbyte transfer rates vs the theoretical 1-Mbyte transfer rates that would happen without those frame delays.

The next device we checked out was the Kodak DVC-300 digital video camera with a USB interface. The USB transactions seemed to be in good working order. The camera was transmitting uncompressed data at about 480 and 240 kbytes/sec in high- and low-resolution modes, respectively. After we plugged in both the camera and the EZ-USB device, the EZ-USB began transmitting data at 210 kbytes/sec. McCoy noticed that the host controller was performing more efficient scheduling of new bulk-transfer requests, allowing the combination of the two devices to generate 700-kbyte/sec USB traffic.

USB bulk throughput isn't limited to 210 kbytes/sec, though. For a real application, you could get better bulk throughput by performing larger bulk transfers or even performing transfers of any size asynchronously. For asynchronous transfers, the host queues multiple transfers, so that one transfer is ready when another one completes. This approach avoids the three- or four-frame latency.

After removing one of the USB devices from Intel's AL440LX, we installed HP's E2920 PCI exerciser and analyzer only to find that the PCI bus was transferring about 12 Mbytes/ sec in burst lengths of two and four double words, even when the USB device wasn't transferring data! There should have been almost no PCI activity. These small bursts resulted in a "PCI utilization"--the number of clock cycles that the PCI bus is busy divided by the number of PCI clocks--of almost 50%. After using FuturePlus' USB analyzer, MindShare (Rich-ardson, TX) Vice President Don Anderson determined that the USB host controller caused these bursts by performing many main-memory accesses. Interestingly, when we plugged another USB device into the second port while the camera was still plugged in, the erroneous PCI traffic practically disappeared. We swapped in other USB devices and verified that the camera wasn't the problem. But we still haven't figured out what caused the wasted PCI traffic, although the group speculated that it may have been Microsoft's OSR2.1 and its interaction with Intel's host USB controller. Furthermore, we verified that this problem disappeared with the ALI PC platform.

We also worked with the Altec Lan-sing USB-connected speakers. Al-though we got them to play audio, any system activity, such as moving the mouse, caused audio breakup. We later determined that the speakers were not to blame. Apparently, the software sound emulation in Detroit is inadequate; the speakers worked well using Windows '98, code-named "Memphis."

Is AGP really better than PCI?

There's no doubt that AGP is better than PCI for 3-D graphics applications with large textures, but the proof may not be so obvious--yet. The AGP is a dedicated graphics bus based on Revision 2.1 of the 66-MHz PCI specification. AGP provides the graphics chip with direct access to textures stored in system main memory. AGP yields 528-Mbyte/sec peak bandwidth by transferring data on both the rising and falling edges of the 66-MHz clock. A full-blown AGP implementation incorporates sideband signals, which enable the graphics chip to pipeline and queue memory requests and allow the graphics chip to issue new addresses and requests while transferring data from previous requests. The AGP specification requires no graphics chip to implement these sideband signals. Alternatively, the graphics chip can imple- ment double-rate and 66-MHz, or "frame-mode," PCI. So far, ATI Technologies' 3D RagePro graphics chip is the only sideband-enabled AGP device. The other available AGP cards implement double-rate or frame-mode PCI. This situation, along with an absence of high-textured 3-D applications and benchmarks, makes it difficult to comprehend the practical value of AGP.

In an effort to probe the differences between AGP- and PCI-based graphics, the group devised several experiments. In the first and simplest experiment, we used Intel's AL440LX and Ziff-Davis' (Medford, MA) 3-D WinBench '97 with large-texture scene (which I call "3DWB-LT") to indicate AGP's capabilities. (You can download this free benchmark from the Ziff-Davis Web site at www.zdbop.com.)

We ran the 3DWB-LT benchmark with graphics cards from ATI, Nvidia, Number Nine, and Intergraph. Each vendor, except Intergraph, supplied us an AGP and a PCI card that used the same graphics chip. Intergraph provided only a PCI card. Intel's AL440LX allowed us to run the Pentium II at 200, 233, 266, and 300 MHz. For most of the graphics cards, we ran the 3DWB-LT at each frequency so that we could determine the benchmark's dependency on the host processor.

We checked the theoretical maximum benchmark performance using a null software driver that drops the graphics rendering of the benchmark into the bit bucket. In other words, this driver allows only the host CPU to perform the geometry calculations. Intel's AL440LX with a 300-MHz processor achieved a whopping 44.7 frames/sec on the 3DWB-LT using the null driver. Nvidia's Riva 128 AGP card, outperforming all the other cards, hit 37.5 frames/sec on this benchmark--indicating that the host CPU's floating-point engine was not the bottleneck. Despite this measurement, we discovered that performance for most of the graphics cards scaled linearly with the CPU's frequency. I speculate that this linear scaling reflects the time that the host CPU spends executing the software driver. You can find results of the test here. In summary, if you plot this data and perform a linear interpolation down to 0 MHz, you'll find that roughly two-thirds of the benchmark's performance relates to the CPU.

We also tested the RagePro AGP card from ATI. At 300 MHz, with 2 and 4 Mbytes of onboard graphics memory, the card yielded 21.8 and 23.4 frames/sec, respectively. These figures differ roughly only 7% from each other, indicating that ATI's implementation was less susceptible to memory-size differences and depended more on the AGP interface. Using ATI's PCI card, the benchmark results for 2 and 4 Mbytes yielded 0.37 and 1.96 frames/sec, respectively. After comparing the AGP and PCI results, our immediate impression was that AGP must be fantastic: Who would ever want to use a PCI graphics card again? But when we repeated the benchmark for Nvidia's AGP and PCI cards, both with 4 Mbytes of local memory, we got results of 37.5 and 34.5, respectively. Additional local memory may further increase the graphics performance, because it would allow more of the textures to be stored locally. However, more important, a well-designed graphics engine yields good performance, whether it's on AGP or PCI. Nvidia's Riva chip uses a deeply pipelined architecture and a 12-kbyte texture cache that helps offload bus traffic, and the core runs at 100 MHz. It also supports good DMA capability to run so well on AGP or PCI.

After running this benchmark with all the graphics cards, we discovered that the test doesn't really stress the advantages of AGP; the test uses only 5.1 Mbytes of textures. ATI asserts that when using a 4-Mbyte card, only 1.76 Mbytes of textures reside in system memory, and, with a frame rate of 34 frames/sec, the graphics controller consumes 60 Mbytes/sec of bus bandwidth. This figure equals 45% of PCI bandwidth, 23% of PCI-66 bandwidth, and only 11% of AGP 23 bandwidth. A better AGP benchmark, although currently unavailable, would be one that changed textures every frame. This benchmark would thrash a graphics chip's cache and depend much more on using AGP's direct-execution mode.

Moving graphics to AGP undoubtedly frees some PCI bandwidth. There's also no doubt that some PCI utilization-intensive applications benefit from the extra bandwidth. For example, what if you were playing a PC game with heavy-duty 3-D graphics or working on a spreadsheet while transferring a file via Ethernet, while teleconferencing, or while playing a DVD movie (all made even more practical with split-screen technology). Where do you get the "extra" bus bandwidth?

To check out the effects of extra bandwidth consumption, we used HP's E2920 PCI exerciser and Anchor Chips' CO-MEM demonstration board to load down the PCI bus. The E2920 includes a stress test with an option to vary bandwidth consumption from 0 to 95%. 25HO2Using Intel's AL440LX with the CPU running at 300 MHz, we ran the 3DWB-LT eight times, progressively increasing the PCI-bandwidth consumption (Figure 2). As you would expect, AGP graphics are impervious to any amount of PCI-bus traffic. However, when using PCI-based graphics, the benchmark performance degrades 15 to 30% when the E2920 uses 27 to 45%, respectively, of the PCI bandwidth. The results are even more drastic as PCI utilization exceeds 70%: Performance drops by more than a factor of three.

As a practical test, I measured the PCI bandwidth consumption of a variety of standard PC products. These products include Sigma Design’s RealMagic Hollywood DVD card, Digital Semiconductor’s 21143 Ethernet controller, Western Digital’s Caviar UltraDMA hard drive, iomega’s portable jaz scsi drive and PCI ultra scsi card, and Altec Lansing’s ACS500 USB interface speakers.

We repeated a similar experiment with the Business Applications Performance Corp's (BAPCO, Santa Clara, CA, www.bapco.com) business-application benchmark. Running the benchmark suite with AGP- and PCI-based graphics, we noticed no appreciable performance difference as we varied the PCI-bandwidth utilization from 0 to 50%. At 50% utilization, performance between the systems with AGP and PCI varied 1 to 12%, depending on the application the benchmark was running. However, with 90% utilization, performance varied 9 to 34%, depending on the application. Unfortunately, because of time restrictions, we chose no PCI-utilization points between 50 and 90%. This selection would have allowed us to determine the inflection point at which PCI utilization starts to affect the benchmark results by impeding the PCI-based graphics performance.

Socket 7 vs Slot 1 is one of the biggest debates in the PC industry. Socket 7 is an open specification; Slot 1 is Intel-proprietary. In Socket 7 systems, the L2 cache and main memory share a bus; Slot 1 systems have a dual-bus architecture. In Socket 7 systems, the CPU accesses the L2 at main-memory speeds; in Slot 1 systems, the CPU accesses the L2 at one-half the core frequency. Socket 7 proponents AMD and Cyrix assert that Socket 7 still has several years of life. Yet, both companies are developing new bus structures. Intel, on the other hand, has stopped all Socket 7-marketing efforts. Furthermore, the company has gone beyond even pushing Slot 1 designs and has started promoting the proprietary Slot 2 architecture.

A political and theoretical discussion regarding Socket 7, Slot 1, and other main-memory and L2-bus implementations is beyond the scope of this article. However, our group of PC experts used a test that demonstrated some obvious benefits of Slot 1. Again using HP's E2920 PCI exerciser, Thomas Dippon, an application specialist for PCI-test tools at HP, set up a script to instruct the PCI exerciser to perform burst reads into main memory using the memory-read-multiple PCI command. (You can view this script, as well as instructions on using the E2920 here.) For this part of the testing, we obtained an Intel AN430TX PC platform, a Socket 7 implementation that works with processors, such as Intel's 233-MHz Pentium with multimedia extensions. To compare apples with apples, we decreased the Pentium II's speed from 300 to 233 MHz on Intel's AL440LX. Both PC platforms come with 512 kbytes of L2 cache, and both have a 33-MHz PCI bus and a 66-MHz system bus. Both PC platforms also used the same SDRAMs, disk drives, and PCI graphics cards and monitors.

During this test, we ran the BAPCO benchmarks three to five times on each PC platform. Each time we ran the test, we changed the throughput of PCI to SDRAM reads by changing the read burst length. Longer bursts produced higher PCI throughput and, therefore, higher bandwidth consumption on the main-memory bus. For example, when we set up the E2920 to perform continuous one-double-word bursts, PCI throughput was 6.6 Mbytes/sec, burst lengths of eight double words yielded a throughput of 40 Mbytes/sec, and so forth.

The results of the test unquestionably favored Slot 1. On Intel's AL440LX PC platform, results varied only 2 to 7%, even when the E2920 PCI exerciser created 60-Mbyte/sec throughput compared with the results we obtained without E2920-induced reads. This throughput is only 11% of the main memory's throughput of 533 Mbytes/sec. But, at the same throughput, benchmark results on the AN430TX PC platform were 80 to 85% lower than on the unmodified test. Even when we used the E2920 to generate only 6.6 Mbytes/sec of main-memory reads, the BAPCO results were down 22 to 37%.

Our testing did not consider all the factors that could affect the perform-ance of this benchmark. For example, the 430TX and 440LX may have different snoop mechanisms and chip-set buffer sizes. You should also consider the algorithms that a chip set uses for managing main-memory pages and bank switching, as well as processor capabilities, such as Pentium II's out-of-order instruction execution. So, although this test provides no conclusive evidence that a Slot 1 implementation is always significantly better than a Socket 7 implementation, the results do show drastic performance differences.

In general, we concentrated on the new and emerging bus enhancements, such as AGP, Slot 1, and USB. Other system functions are not static, however. For example, although the PCI-bus-interface architecture of DMA combines with a FIFO buffer to satisfy many applications' needs, a new architecture based upon caching SRAM may un-leash additional performance gains. This new architecture, as Anchor Chips' AN3041Q device demonstrates, minimizes the number of steps necessary to transfer data into PCI local processors and DSPs. Additionally, cost-sensitive applications can eliminate local memory by caching their memory needs across the PCI bus to main memory.

From a system perspective, the larger memories that this architecture supports allow the local processor to exploit PCI's bursting capability, resulting in the movement of lots of data with minimal PCI-bus overhead. As you might guess, bursting impacts other system components. To test this effect, our group loaded down the PCI bus using the AN3041Q. In this setup, some motherboards could support sustained transfer rates of 50 Mbytes/sec, whereas others topped out at 25 Mbytes/sec. This variance in transfer rates gives designers a challenge to maintain a balanced system.


Other sources

The best all-around reference for PC architecture is the MindShare PC System Architecture Series, published by Addison Wesley (Reading, MA). You can check out these books at www.mindshare.comor www.aw.com/devpress.

A good place to obtain practical information about PC architecture and performance analysis is Tom's Hardware Page at www.sysdoc.pair.com. This Web site also offers lots of links to other good sites.

Acknowledgments

In addition to the "Men at Work" team, I'd like to acknowledge George Alfs and Michael Greene of Intel, Eric Lundgren and Rick Osborne of ATI, Mike Blaskovich of Digital, Stuart McClaren of Nvidia, Donald MacDonald and Jeanne Cotter of HP, Maurizio de Julio of Anchor Chips, J Taylor, and Maury Wright of EDN.


XXGLANCE
  • Although Anchor Chips' 2131Q EZ-USB device transferred data at around 1 Mbyte/sec, the USB controller incurred a three- to four-frame turnaround delay, thus limiting transfers to 164 kbytes/sec.

  • AGP frees PCI bandwidth and is impervious to any PCI-bus-mastering activity that may simultaneously occur.

  • Running standard business-application benchmarks while using HP's PCI-bus exerciser to generate PCI-to-main memory reads shows the advantages of Slot 1.

The basic hardware PC platforms

Throughout the project, the "Men at Work" used a variety of peripherals, including graphics cards, digital-video-disk (DVD) players, and USB speakers. However, we used the same basic hardware configurations for all PC platforms.

PC platforms: Acer Labs M1531/43 Pentium chip-set demo board, Intel AL440LX for Pentium-II, Intel AN430TX 22 for Pentium.

Monitor: NEC 21-in. Multisync E1100 (one fantastic monitor!).

Memory: 64 Mbytes of Hitachi, LG Semicon, or Samsung SDRAM (for this project, all worked equally well).

Disk drive: 5.1-Gbyte Western Digital Caviar 35100.

DVD: Hitachi GD-2000, which also handles CD-ROMs.

The $84,180 Men-at-Work PC-analysis lab

Analysis of PCs requires a logic analyzer that provides visibility into all system buses. Hewlett-Packard and FuturePlus configured a system with sufficient channels, memory depth, and bus-specific preprocessors to meet the measurement needs of Pentium II's front-side bus, the AGP2X, the DIMM interface, PCI, and USB. This system allowed us to make time-correlated measurements of transactions occurring simultaneously on the critical buses of the 300-MHz Pentium II-based PC under test.

HP's E2920 PCI series of computer-verification tools was invaluable for this project. The series includes the 32-bit, 33-MHz E2925A PCI exerciser and analyzer card, the E2970 PCI-analyzer user interface, the E2971A PCI-exerciser user interface, the E2972A performance analyzer, and the E2974 subsystem tests. We used the E2974 for most of our analysis to consume PCI bandwidth; this test allowed the E2925A to write to its own target. This ability allowed the device to emulate the bus load without accessing any other system components. The E2972A allowed an in-depth performance analysis of sampled PCI transactions. You can use it to display charts and reports for PCI-utilization, throughput, command-usage, burst-length, and latency functions. A full-blown system costs around $27,000.

HP supplied its $9500 16500C logic-analysis system with $15,765 16555D logic-analyzer modules and a $12,035 16534A oscilloscope module. The 2M-sample/sec 16555D is a 110-MHz, state-analysis or 500-MHz, timing-analysis card. The 16534A is a 2G-sample/sec, 32k-sample card with a 500-MHz single-shot bandwidth. We added the $4995 16505A prototype analyzer, which gave us a large-screen, X-windows display that allowed us to see our data in multiple domains simultaneously and to quickly pinpoint problems with global markers correlated across the various domains.

FuturePlus added the preprocessors to make the physical and electrical connection between our HP test system and the target PC platform. We used FuturePlus' $1000 FSAGP32TE Advanced Graphics Port (AGP) probe and interposer card to connect to the AGP. We removed the AGP graphics card, installed the AGP probe, connected HP's $345 E5346A high-density termination adapters, and then reinstalled the AGP card. After loading the probe's configuration software into the logic analyzer, we began our analysis. It was easy to see the transition to sideband mode when using ATI's graphics card during the Ziff-Davis 3-D WinBench with large-texture benchmark.

To connect to the USB, we plugged the USB devices into FuturePlus' $2500 FSUSB USB analyzer. We then connected the FSUSB to the PC platform. After loading FuturePlus' configuration file and transaction disassembler, we monitored the USB transactions between the USB device and the host computer. The FSUSB operates at 12 or 1.5 Mbps and supports dynamic speed changing. It supports isochronous and other types of data transfers and detects bad packet-identifiers (PIDs), invalid-PIDs, serial-bit-stuffed, and CRC errors on all packet types. It also detects start-of-frame tokens sent at low speed and provides test points to measure the power and signal fidelity of the USB.

To perform cross-domain analysis, we simply set one of the logic analyzers in the HP 16500C to monitor the AGP and the other to monitor the USB. It was easy to create a custom measurement system, cross-domain trigger between buses, and view data from multiple buses simultaneously in the same display. Similarly, we connected the $9000 HP E2466C Pentium II preprocessor to the host processor so we could monitor the transactions between the Pentium II and the various system buses. We also used HP's $2040 B4600A system-performance-analysis tool to display the distribution of transaction types on USB and the processor front-side bus during video data transfers.

For more information...
For information on products such as those discussed in this article, circle the appropriate numbers on the Information Retrieval Service card or use EDN's Express Request service. When you contact any of the following manufacturers directly, please let them know you read about their products in EDN.
Acer Labs Inc
San Jose, CA
1-408-467-7456
www.acerlabs.com
Adaptec Inc
Milpitas, CA
1-800-959-7274
www.adaptec.com
Altec Lansing
Milford, PA
1-717-296-2818
www.altecmm.com
AMD
Austin, TX
1-800-222-9323
www.amd.com
Anchor Chips Inc
San Diego, CA
1-619-676-6815
www.anchorchips.com
ATI Technologies Inc
Thornhill, ON, Canada
1-905-882-2600
www.atitech.com
Cyrix Corp
Richardson, TX
1-214-994-8388
www.cyrix.com
Diamond Multimedia
San Jose, CA
1-408-325-7000
www.diamondmm.com
Digital Semiconductor
Maynard, MA
1-508-568-6868
www.digital.com/info/semiconductor
Eastman Kodak Co
Rochester, NY
1-800-235-6325
www.kodak.com
FuturePlus Systems Corp
Bedford, NH
1-603-471-2734
www.futureplus.com
Hewlett-Packard
Colorado Springs, CO
1-800-452-4844
www.hp.com
Hitachi America Ltd
Brisbane, CA
1-415-589-8300
www.hitachi.com
Intel Literature Center
Mount Prospect, IL
1-800-548-4725
www.intel.com
Intergraph Computer
Systems
Huntsville, AL
1-800-692-8069
www.intergraph.com
Iomega Corp
Roy, UT
1-801-778-1000
www.iomega.com
LG Semicon
San Jose, CA
1-408-432-5000
Microsoft Corp
Redmond, WA
1-206-882-8080
www.microsoft.com
NEC Technologies Inc
Itasca, IL
1-800-632-4662
www.nec.com
Number Nine
Lexington, MA
1-617-674-0009
www.nine.com
Nvidia Corp
Sunnyvale, CA
1-408-720-6100
www.nvidia.com
PC Power and Cooling
Carlsbad, CA
1-760-931-5700
PowerQuest Corp
Orem, UT
1-801-437-8900
www.powerquest.com
Quantum Corp
Milpitas, CA
1-408-291-2492
www.quantum.com
S3 Inc
Santa Clara, CA
1-408-980-5400
www.s3.com
Samsung Semiconductor IncSan Jose, CA
1-408-954-6972
www.sec.samsung.com
Sigma Designs Inc
Fremont, CA
1-510-770-0100
www.sigmadesigns.com
Western Digital Corp
Irvine, CA
1-714-932-4900
www.wdc.com
   
Men at Work
Hot Link Warning! Material is of a violent nature!
This project would have been impossible without help from the "Men at Work" team:
The high-performance work crew includes (standing left to right) Niel Smith of Acer Labs, Bill Furch of FuturePlus, Mark McCoy of Anchor Chips, Don Anderson of MindShare, Markus Levy of EDN, Thomas Dippon of Hewlett-Packard and (seated left to right) Ron Sartore of Anchor Chips and Chuck Small of Hewlett-Packard. Will Morris was not present but critical to the project. 34386_s3
  • Don Anderson, vice president of MindShare (Richardson, TX), trains engineers on critical elements of PC-system design. He is the co-author of many books about various aspects of PC architectures, including Pentium, PCI, USB, and the new FireWire System Architecture. Anderson has been involved in digital-system design for more than 20 years, including eight years at Schlumberger and five years at Compaq Computer.

  • Thomas Dippon is an application specialist for the PCI-test tools that he developed with Hewlett-Packard in Germany. He joined HP in 1991. Dippon holds an MS in computer science from the University of Stuttgart.

  • Bill Furch is vice president of marketing for FuturePlus Systems, which designs and manufactures bus analyzers that use HP logic analyzers as analysis-execution engines. He is a 22-year veteran of HP and holds a BSEE from the University of Denver.

  • Tim Harvey is a system engineer at Anchor Chips, specializing in PCI- and embedded-system architecture. He came to Anchor with extensive design experience in PC-, satellite-, TV-, and communication-product development. As principal system engineer at TWAV Inc, he designed and implemented a fully integrated, PC-based, telephone-video-transmission system. Harvey holds a BSEE from University of Illinois at Champaign--Urbana.

  • Mark McCoy is a software engineer at Anchor Chips, specializing in Windows driver model and Windows NT device-driver development. He also has extensive knowledge of USB. At Anchor, McCoy is responsible for USB-software development and implementation for both device drivers and applications. Before joining Anchor, he worked at Compaq Computer. McCoy holds a BS in computer science from the University of Houston. His knowledge of Windows and its peculiarities was invaluable for this project.

  • Will Morris, an employee of Intel since 1984, is a senior technical marketing engineer in Intel's Architecture Labs. One of his roles is defining the product requirements of boards, such as the AL440LX and AN430TX in this project. He has also designed hardware and ASICs for Intel's Multibus II and ProShare products.

  • Ron Sartore, president and CEO of Anchor Chips, heads the company's product development and engineering. A leading expert in PC-system design, Sartore has designed award-winning computers and industry-standard semiconductors. At Cheetah International Inc, a company he founded before starting Anchor Chips, his designs won accolades from several publications. A veteran of other EDN hands-on projects, Sartore also has an extensive background in semiconductors, having worked at Inmos and Texas Instruments. He holds a BSEE from Purdue University (West Lafayette, IN).

  • Chuck Small is a 31-year employee of HP. He has held various positions in R&D and marketing and was part of the team that designed HP's first logic analyzer. Currently, he is product manager for Intel-processor support. Small received his BSEE and BS in computer science from the University of Colorado--Boulder.

  • Niel Smith is manager of application engineering for Acer Laboratories Inc of North America. He has 20 years of R&D, design, and application experience in core logic, high-speed analog-signal processing, electromechanical servo design, RF/microwave design, wideband video-signal processing, and PC I/O and disk interfacing. Smith was also director of worldwide applications for Appian Technology and has spent the last 10 years in PC-specific semiconductor engineering. He holds a BA in general studies from Washington State University and a BSEE from San Jose State University.


Markus Levy, Technical Editor

You can reach Markus Levy at 1-916-939-1642, fax 1-916-939-1650, markus.levy@worldnet.att.net.


| EDN Access | Feedback | Table of Contents |


Copyright © 1997 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc.