As PCI Express (PCIe) bus hierarchies become more distributed, the time between when a posted memory write packet is initiated from one device (the initiator) and when it arrives at another device (the target) is becoming longer and longer. With this environment, the detection of a posted memory write packet that has become lost or dropped due to some error also takes longer and longer.
Typically, if a packet is lost between two devices, some sort of error is generated to the root complex of the PCIe hierarchy, but because of the general increase in device distribution, the generation of the error might occur only at a substantial time after the initiator device posts the packet. Once the initiator device posts the memory write packet to the PCIe hierarchy, the initiator device assumes that the packet will get to the target device error-free, since there is no packet acknowledgement built into PCIe.
The problem with this scenario is that the initiating device might generate a subsequent irreversible action that depends on the successful completion of the posted packet. Since there is no acknowledgment of success for the posted packet, the subsequent irreversible action might occur even if the posted packet is not successfully received. An example of such an irreversible action is a peripheral device that receives data from outside the PCIe hierarchy and writes the data to a non-volatile memory located across the PCIe hierarchy. When peripheral device initiates the posted memory write of data to the non-volatile memory, it might then signal back to the source of the data that the data was written to the non-volatile memory, when in reality the data is still in flight across the PCIe hierarchy—or lost altogether. In this latter situation, the detection of the error does not allow the root complex to easily identify the initiator device of the posted packet.
The PCIe bus does not have a defined way of handling issues such as these within the standards specification, so generally such errors are handled within the firmware running on the root complex of the PCIe hierarchy. One method that is used to handle such problems is as follows. When a subsequent irreversible action is dependant upon the successful completion of a posted packet, the initiator waits for an intervention from a root complex before starting the irreversible action.
For example, after the initiator posts a packet and before it starts the subsequent irreversible action, the initiator generates an interrupt to the root complex, and then stops processing until the interrupt is processed by root complex. During the length of time that the root complex takes to process the interrupt, there is an assumption (but not a guarantee) that any error that is caused by the posted packet will report to the root complex, and stop the root complex from signaling the start of the subsequent action by the initiator device.
Another method of reducing the effects of this problem is for the initiator device to generate a memory read packet to the posted memory space, after the initiator posts the memory write packet to the memory space, and before the initiator starts the irreversible action. In this manner, the posted write is guaranteed to have completed by the time the read completes, and if the posted write generates an error, the root complex can stop the initiator device from starting the subsequent irreversible action. But again, there is no guarantee that the root complex can stop the initiator device before the read completes.
The disadvantage of these traditional methods or combinations of methods is that they require intervention from the root complex before the subsequent irreversible action is initiated. Depending on the performance of the root complex, this intervention might be delayed by a significant amount of time, which will tend to impact overall system performance.
What is needed, therefore, is a system that overcomes problems such as those described above, at least in part.