1. Field of the Invention
The present invention relates to digital communication systems and, more specifically, to a system that communicates error information between an endpoint and a root complex.
2. Description of the Prior Art
Digital systems employ several different standard bus types to facilitate communication between different entities. One type of digital system includes a central root complex that is in communication with one or more endpoints via a bus system. One such bus system, PCI express, is an industry standard input/output (IO) bus used to connect IO devices to a processor complex via a root complex bridge.
Many communication systems employ a periodic cyclic redundancy check (CRC) to detect alteration of data being communicated. CRCs are popular because they are simple to implement in binary hardware, are easy to analyze mathematically, and are particularly good at detecting common errors caused by noise in transmission channels.
IO transactions from a root complex to an IO adapter may have CRC errors, may be malformed, poisoned, timed out, aborted, unexpected or even unsupported. When a PCI express endpoint detects an error, it logs the PCI transaction details in one or more PCI standard advanced error reporting (AER) registers, but it only sends a generic error message to the root complex.
This error message only specifies the severity of the error—whether it is fatal or non fatal—but it does not correlate the error message to the specific transaction that had an error. It is possible that a fatal error would require platform firmware to reset the root complex. It is also possible that a PCI-Express link would go down and not retrain, in which case there would be no way for root complex hardware or host software to access the AER registers in the adapter attached on the PCI express link and, therefore, it would be impossible to determine which transaction had an uncorrectable error and what the error was. This is a serious limitation that reduces system availability and increases warranty costs for the host and adapter vendors to isolate the source of these errors. The current error reporting mechanism from the adapter also prevents any possible auto retry of several uncorrectable errors by the root complex without any software intervention. This is also a limitation that reduces availability and increases warranty expense.
Therefore, there is a need for a system that transmits error information from an endpoint to a root complex that identifies an error and the transaction that is executing when the error occurs.