The PCI Express (PCIE) Specification provides a set of advanced error reporting and error logging features to determine the type of error that occurs in a PCIE link. For more information, see PCI Express Base Specification, Revision 1.1 (http://www.pcisig.com/specifications/pciexpress/). The error reporting and logging features provide specific information about the type of error that occurred, such as receiver errors (e.g., 8b/10b errors), Data Link Layer Packet (DLLP) Cyclical Redundancy Checking (CRC) errors, Transaction Layer Packet (TLP) CRC errors, and Malformed TLP.
In the case of uncorrectable errors, such as Malformed TLP, the header information associated with the error is logged. This information can help diagnostic software to pinpoint the cause of the failure. For correctable errors, such as 8b/10b errors and CRC errors, no header information is logged. Most link failures arise from a link that becomes marginal over time or from connections that become loose. These issues manifest themselves as 8b/10b errors and if those go undetected, as TLP/DLLP CRC errors. In this case, the PCIE Specification does not provide any mechanism to identify the lane where the failure occurred. If a lane fails completely, the link will enter the Recovery state of the Link Training and Status State Machine (LTSSM) and eventually eliminate the faulty lane. However, for a marginal lane, there is no mechanism to identify the marginal lane. Thus, no corrective action can be taken and the link can only repeatedly attempt Recovery. If the recovery mechanism can correct the fault, then the link keeps operating. However, if the recovery mechanism cannot correct the fault, but the link successfully renegotiates, the link will eventually go down.
FIG. 1 illustrates one embodiment of a PCIE topology. Computer system 100 includes a root complex (RC) 120 and multiple endpoints 150 and 160. Root complex 120 denotes the root of an input/output (I/O) hierarchy that connects the microprocessor 110 and memory 130 subsystem to the I/O. As illustrated, root complex 120 supports one or more PCIE ports where each interface defines a separate hierarchy domain. Each hierarchy domain may be composed of a single endpoint, such as endpoint 150 and endpoint 160, or a sub-hierarchy containing a switch with multiple endpoints. Endpoint refers to a type of I/O device that can request or complete a transaction.
As used herein, the term upstream refers to a direction up the hierarchy of connection, e.g. towards root complex 120. Inversely, downstream, as used herein refers to a direction down the hierarchy of connection, e.g. away from root complex 120.