Happy birthday PCI-SIG… and PCIe power management debug
It helps, of course, that the PCI-SIG is supported by some of the biggest names in the industry to drive adoption. Still, the last 20 years have been busy, with committees churning out specifications that continue to shape many of our designs. From the very beginning with the 32-bit PCI specification to today’s 3rd Generation PCI Express specification, the PCI-SIG has stayed relevant and is looking forward with the upcoming 4th Generation PCI Express specification.
As PCIe technology has changed from parallel to serial, from bits to lanes, and from megahertz to Gbps, the challenges of debugging and verifying systems have also changed. Verifying setup and hold times of a 64-bit data bus has been replaced with jitter analysis and eye diagrams. Verifying proper retry operations has given way to verifying correct power management operations.
In fact, in working with PCIe designers around the world, power management has been one of the most complex issues to debug in both PCIe Gen1 and Gen 2, and continues to be a challenge in PCIe Gen 3 systems.
To successfully debug power management issues, you need to supplement your scope with a protocol analyzer that can provide protocol decode of all layers of the link and can quickly synchronize to the link as power saving states are exited. Power management transitions pass quickly. A tool that is slow to respond to these transitions would fail to capture events on the bus right after normal operating mode (L0) is re-established missing potential problems. Let’s take look at one such scenario but first a quick refresher on the various PCI Express power states:
L0 - Active normal operating mode state
L0s - Energy saving “standby” state with fast recovery back to L0
L1 - Lower power “standby” state but with a longer recovery than L0s
L2 - Auxiliary-powered deep-energy-saving state
L3 - Link Off state
Now consider a scenario where a TLP (Transaction Layer Packet) Memory Write is writing to an incorrect address space causing bad system behavior. This type of problem occurs often during hardware/software integration but only when power management is enabled. This is because power state changes are stressful to the transmitter and the receiver and can cause link quality issues that show up in the form of logic errors. As the link exits L0s, FTS (Fast Training Sequences) Ordered-Sets are transmitted. The receiver uses these FTS Ordered-Sets to re-synchronize to the transmitter. The number of FTS Ordered-Sets that will be transmitted is determined during link training and will be as few as possible to enable a fast transition from L0s back to L0. Here’s a training sequence that sets the number of FTS Ordered-Sets at 31:
If the analyzer were unable to lock to the data as the link exited from electrical idle, any packets transmitted at the start of normal operation would not be acquired. In this case, the TLP Memory Write packet would not be acquired and it would be impossible to verify that the address of the packet was incorrect.
ASPM (Active State Power Management) continues to be a place where a lot of debug issues are showing up. But there are others, even after 20 years of PCIe. What are your most pressing PCIe debug challenges?
Mike Juliana has been with Tektronix since 1988 in a variety of engineering and management roles. He has a bachelor's degree in Electrical Engineering from Brigham Young University and a master's degree in Computer Engineering from Oregon State University. His 10+ years of design experience (embedded systems, FPGAs, and ASICs) helps him in his current role of assisting hardware and software engineers understand how logic analyzers can help them speed through their debug and verification tasks.