Field of the Invention
The present invention relates to an apparatus and method for maintaining data integrity.
Description of the Related Art
In a “highly available” commercial computing environment, hardware and software technologies are typically combined to provide quick recovery of critical programs from hardware and software failures. The environment is designed to eliminate single points of failure.
For example, a typical highly available environment may comprise a number of loosely coupled computers sharing resources such as disk drives. Critical programs are capable of running on any of a set of computers. Furthermore, hardware resources (e.g. disk drives) are shared amongst the computers. A hardware or software failure which leads to unavailability of a critical program can be remedied by moving the critical program to another computer thereby restoring its availability.
Typical highly available environments were historically managed by software known as high availability (HA) software. HA software typically provides management of the hardware and software components by monitoring the components and taking responsibility for moving resources in response to failures.
An example of such a highly available environment (100) is shown in FIG. 1A and comprises a first computer (105) having first HA software (117) and a first instance of a highly available software component (110) wherein the software component (110) is operable to access a shared resource e.g. a shared disk (120) comprising data).
The highly available environment (100) also comprises a second computer (115) having second HA software (119) which is operable to communicate with the first HA software (117).
In an example, the first instance (110) of the first computer (105) accesses the shared disk (120). The first HA software (117) has “ownership” of the shared disk (120).
In the event of a failure of the first computer (105), the second HA software (119) is operable to detect the failure (e.g. in response to the second HA software (119) no longer being able to communicate with the first HA software (117)).
With reference to FIG. 1B, in response, the second HA software (119) is operable to terminate the first instance (110), move ownership of the shared disk (120) to itself and subsequently start a second instance of the software component (125) on the second computer (115). The second computer (115) is subsequently operable to “take over” responsibility from the first computer (105) and the second instance (125) running on the second computer (115) is able to access the shared disk (120).
The environment (100) of FIGS. 1A and 1B provides a guaranteed single activation of the software component—that is, it is not possible for two instances of the software component to start at the same time on different computers. If two instances of the software component were to start at the same time on different computers, this may cause errors such as corruption of data on the shared disk (120).
Although the environment (100) described above provides for high availability of a critical program and a guaranteed single activation of the critical program, special hardware (e.g. the shared disk (120) which has to be specifically configured such that it can be accessed by multiple computers) and software (e.g. the HA software) is required.
With more modern techniques, it is possible to achieve high availability and a guaranteed single activation without the requirement of special hardware and/or software.
A representation of such an environment (200) is shown in FIG. 2, wherein the environment (200) comprises two instances of the same software component. In more detail, the environment (200) comprises a third computer (205) having a third instance of a highly available software component (210) wherein the third instance (210) is operable to access a shared resource e.g. a shared disk (220) comprising data. The environment (200) also comprises a fourth computer (215) having a fourth instance of the highly available software component (225) wherein the fourth instance (225) is also operable to access the shared disk (220).
In the examples of FIGS. 1A and 1B, the software component does not have to assume responsibility for guaranteed single activation because the environment (100) comprises HA software. As the environment (200) of FIG. 2 does not comprise HA software, the software component needs to be capable of ensuring that the data on the shared disk is not corrupted by uncoordinated access from both of the computers (205, 215) at once.
If each instance of the software component consists of a single process respectively and if the data on the shared disk is contained in only a small number of files, it is sufficient to use file locking of the data files on the shared disk to ensure that data integrity is maintained.
For example, exclusive file locking can be used to ensure that only one running instance of the software component at a time is reading or writing the data files. In a more complex approach, “range-locking” of areas of the data files can be used to ensure multiple instances of the software component do not corrupt the data files by uncoordinated accesses.
Further improvements are required when a software component increases in complexity.