A Peripheral Component Interconnect Express (PCIe) is a high-performance system bus used on a calculation and communication platform. A PCIe bus is widely used in an interconnect system of a central processing unit (CPU) and a peripheral device and functions as a core service channel in a calculation and storage device. There may be multiple types of peripheral devices that interconnect with a CPU through a PCIe bus, such as a network interface card device or a solid state disk (SSD). Such devices are called PCIe endpoint devices in this document.
A PCIe bus is widely used as a bus interface of a server or a storage system. During normal running of a system, a PCIe endpoint device needs to be added or removed in a non-power-off condition because of an online capacity expansion and maintenance demand, that is, a hot swap demand. Current PCIe hot swap complies with the following operation process: an operator initiates a hot swap request by pressing a button; after learning a hot swap event, a hot swap controller notifies all drives that may access the PCIe endpoint device to stop accessing the PCIe endpoint device, and uninstalls resources of the PCIe endpoint device on which the hot swap needs to be performed; and then the PCIe endpoint device is powered off and the operator removes the PCIe endpoint device.
Advance notification is required for the current hot swap of a PCIe endpoint device so as to ensure normal running of the system. However, in recent years, the PCIe bus is gradually developed from intra-system interconnection to inter-system interconnection, and applications, such as external cables, increase. The cables easily disconnect abnormally, and therefore a condition that a PCIe endpoint device is abnormally offline without advance notification occurs. In addition, a scenario that a user uses a solid state disk (SSD) to directly connect to a system occurs more and more widely. The user may directly insert or remove the SSD disk without advance notification due to a user habit factor. For the mentioned condition that the PCIe endpoint device is offline abnormally and suddenly, if the CPU has initiated read and write instructions to the PCIe endpoint device, the relevant instructions will stay in a state of waiting to be done; and when instructions for accessing the PCIe endpoint device from the CPU continuously accumulate to a certain degree, the CPU regards the entire system as abnormal and reports a machine check exception (MCE) error. As a result, a reset of the entire system is caused.