The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for providing efficient distributed cache consistency.
A cluster system, also referred to as a cluster multiprocessor system (CMP) or simply as a “cluster,” is a set of networked data processing systems (or “nodes”) with hardware and software shared among those data processing systems, typically (but not necessarily) configured to provide highly available and highly scalable application services. Cluster systems are frequently implemented to achieve high availability as an alternative to fault tolerance for mission-critical applications such as data centers, aircraft control, and the like. Fault tolerant data processing systems rely on specialized hardware to detect hardware faults and to switch to a redundant hardware component, regardless of whether the component is a processor, memory board, hard disk drive, adapter, power supply, etc. While providing seamless cutover and uninterrupted performance, fault tolerant systems are expensive due to the requirement of redundant hardware, and fail to address software errors, a more common source of data processing system failure.
High availability can be achieved in a cluster implemented with standard hardware through the use of software that permits resources to be shared system wide. When a node, component, or application fails, the software quickly establishes an alternative path to the desired resource. The brief interruption required to reestablish availability of the desired resource is acceptable in many situations. The hardware costs are significantly less than fault tolerant systems, and backup facilities may be utilized during normal operation.
The nodes of a cluster share resources of the cluster, including files, data structures, storage devices, and the like. As such, the various nodes of a cluster may each attempt to read and write data from/to these shared resources. Hence, mechanisms for controlling the reads and writes so as to ensure the veracity of the shared resources are usually implemented.