Currently, large transaction processing systems are distributed systems formed from a plurality of clusters interconnected by a network called an external switching fabric 36. FIG. 1A shows such a plurality of clusters 10, 12, 14, 16, which is also called a constellation 18. Clusters 10–16 are so designated because they comprise nodes interconnected by an internal switching fabric 20, 22, 24, 26. Each node 28a–c, 30a–c, 32a–c, of a cluster is a server unit that includes processing units and controllers that enable a processing unit to access stable (non-volatile storage). Stable storage is storage that survives a failure of the processing unit. A processing unit includes a central processor and memory, each independent of the other processing units, such that a failure of one processing unit does not cause a failure of another processing unit. The memory of each processing unit is dedicated to the central processor of the processing unit. That is, the memory is not shared by other processing units.
FIG. 1B is a block diagram of an exemplary cluster 10, 12, 14, 16 with three processing unit pairs 40a–b, 42a–b, 44a–b, each processing unit being connected to pair of interprocessor buses 46, 48 that constitute the internal switching fabric 20 of the cluster, though other numbers of processing units and other types and numbers of buses are possible. Each processing unit pair 40a–b, 42a–b, 44a–b is connected to a corresponding controller pair 50a–b, 52a–b, 54a–bto have redundant access to the device controlled by the controller. Controller pair 50a–b is connected to storage volumes 56a–b and controller pair 54a–b is connected to storage volumes 58a–b, while controller pair 52a–b is connected to a communications unit 60. Storage volumes 56a–b, 58a–b are not required to be physically present in the cluster. In some embodiments, the storage volumes are part of a large storage system to which each of the processing units has access.
A cluster 10, 12, 14, 16, as shown in FIG. 1, can become inoperable or become inaccessible for a period of time. Inaccessibility can occur when software operating on the cluster must undergo a major overhaul or if there is a communication failure in the external switching fabric 36 that isolates a cluster from the other clusters in the constellation 18. When such a failure occurs, it is desirable to make the cluster failure relatively unobservable to the users of the system. One solution is to have a spare system that acts as a hot standby to the failed cluster. However, this is exceedingly expensive if the inoperable or inaccessible cluster is large.
Thus, there is a need for an improved system and method for handling cluster inoperability or inaccessibility without a hot standby. Often, if a locking scheduler is used to schedule transactions, the locks held by the transactions in the workload can be an impediment to handling cluster inoperability or inaccessibility. The locks are used to preserve the consistency of the database and lack of knowledge of which locks are set if a failure occurs can jeopardize the consistency of the database if processing which affects the database continues.