1. Technical Field
This disclosure generally relates to resetting an integrated circuit, and more specifically relates to resetting a PCI host bridge.
2. Background Art
A PCI host bridge (PHB) is an integrated circuit or chipset that provides an interface between a CPU bus, such as a PowerPC CPU bus, and a PCI bus, such as a PCI express (PCIe) bus. PCI host bridges are very common in modern computer systems.
In the existing PowerPC architecture, when a fatal error occurs in a PHB, the PHB is reset, which clears all the registers including the configuration registers. The firmware must then reconfigure all the configuration registers, even when the configuration is the same as before the fatal error. The reset process may include time waiting to make sure all PowerPC bus operations are finished before resetting the PHB. The time waiting for all PowerPC bus operations to finish plus the time to reset the PHB can be in the range of six to eight seconds.
High availability computer systems allow recovery of a fatal error in a graceful way that is mostly transparent to the user. Many high availability computer systems include virtual machines. A high availability computer system may have a failover time threshold where if a virtual machine does not respond within the specified failover threshold, the high availability computer system initiates failover of a virtual machine. For example, a high availability system may have a failover threshold of 12 seconds, which means if a logical partition is unresponsive for 12 seconds, the logical partition is moved to a different virtual machine. If a fatal error in a PHB takes six to eight seconds for the PHB to recover, and the virtual operating system in a virtual machines takes an additional three to five seconds to finish recovering the adapter and start Ethernet traffic, then the client logical partition takes a second or two to reestablish the TCP/IP connection, the total time delay caused by a fatal error in the PHB can exceed the failover threshold for the VM, which will cause a failover of the virtual machine when the virtual machine has not failed, but simply needs time to finish recovering from a PHB fatal error.
One possible solution would be to make the failover threshold higher. So instead of 12 seconds, the failover threshold could be set to 20 seconds. If the failover threshold were set to 20 seconds, one could be sure that any fatal error in a PHB would be recovered from within the 20 second time period, which would prevent a failover from occurring due to the wait associated with a PHB fatal error. This solution is not desirable because it creates additional delays in performing failover when failover is needed. Thus, by changing the failover threshold from 12 to 20 seconds, each time a failover is needed, there is an additional eight seconds of delay before the failover occurs. This additional time is not acceptable in a high availability system because this delay is perceivable to end users.