The present disclosure relates to a host system that uses extended error handling to restore a reduced capacity PCIe link between the host system and an I/O expansion drawer to full capacity in a non-disruptive manner.
Peripheral Component Interconnect Express (PCIe) is a serial expansion bus standard for connecting a computer to peripheral devices. The PCIe standard is well known as a PC backplane interface standard, and has gained popularity as a high-speed cabling interface between a host system and I/O expansion drawers that increase, or “expand” the amount of PCIe slots of the host system. The high-speed cabling interface between the host system and I/O expansion drawer is referred to as a PCIe “link.” A PCIe link may utilize multiple physical cables that each provide a portion of the PCIe link's “lanes.”
When one of the physical cables becomes non-functional, such as from being disconnected or severed, the host system typically reconfigures the PCIe link to a reduced capacity, such as from 16 lanes to eight lanes, and utilizes the remaining functioning cables to communicate with the I/O expansion drawer. To restore the reduced capacity PCIe link to full capacity, the cable must be replaced or reconnected and the link retrained. Retraining a PCIe link impacts the PCIe slots connected by the link. Typically communications with the PCIe slots connected by the link are lost. Workloads using the PCIe slots connected by the link must be ended prior to retraining the link or the workloads may fail ungracefully.