1. Field of the Invention
The present invention relates to a method for managing access to shared resources in a computer system, a computer system for executing the method, and a computer program product containing code portions to execute the method.
2. Description of Prior Art
Record and Replay is a software-based state replication solution designed to support recording and subsequent replay of the execution of applications running on multiprocessor systems for fault-tolerance. Multiple instances of the application are almost simultaneously executed in separate virtualized environments called containers. Different containers may run on different computer systems. Containers facilitate state replication between the application instances by resolving the resource conflicts and providing a uniform view of the underlying operating system across all clones. The virtualization layer that creates the container abstraction actively monitors the primary instance of the application and synchronizes its state with that of the clones, named secondary or backup instances, by transferring the necessary information to enforce identical state among them. For details, see Philippe Bergheaud, Dinesh Subhraveti, and Marc Vertes, “Fault Tolerance in Multiprocessor Systems Via Application Cloning”, 27th International Conference on Distributed Computing Systems, Toronto, 2007, incorporated herein by reference.
In the record and replay technology, the execution of an application program in one of the secondary instances is following the execution of the same application program in the primary instance in loose lockstep. Keeping the instances in lockstep is realized by running the same program code instructions in the primary and secondary instances. At any point in the program flow that allows non-deterministic program code execution, the primary instance is recording the execution choices and the secondary instance is replaying the recorded decisions for the program flow rather than pursuing a non-deterministic execution.
Concurrent accesses to shared memory are a source of non-determinism in the program execution based on operating systems that are running multiple processes or multiple threads of execution in parallel. Processes or threads of execution are instances of sequential code being executed in user space, which is not part of the operating system kernel. Processes or threads reference memory locations by specifying virtual addresses. Depending on the architecture of the operating system, different processes or threads may share parts of the addressable memory. A process is multi-threaded when it contains multiple threads that share a common address space. The process is single-threaded when it has a single thread.
Assume that each of two processes needs exclusive access to two shared memory locations at almost the same time. The first process may request access and acquire a first lock to the first memory location. The second process may request access and acquire a second lock to the second memory location. In this scenario, the first process is waiting for the second process to release the second lock to the second memory location and the second process is waiting for the first process instance to release the first lock to the first memory location. This is a memory race that results in a deadlock situation, where two processes are blocking one another from continuing execution.
In computer operating system, the physical memory is divided into pages, for example, 4 kilobytes each. The access to these memory pages may be shared among multiple processes. To eliminate the non-deterministic memory races described above, access to shared memory pages must be serialized. A possible serialization mechanism allows access to a shared memory page only to a process that has an exclusive reservation for this memory page. All processes that do not have a page reservation to this memory page will fail when trying to access this memory page. To allow the access to other second processes, the first process must release the reservation of the memory page after a finite amount of time. By recording a sequence of memory page reservation events and page release events on the primary system and replaying the recorded sequence of events on the backup systems in the same order as on the primary system, memory access races will be resolved in the same way on all systems.
The implementation of the record and replay technology uses the memory page fault mechanism of the virtual memory implementation of the operating system. Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory, while in fact the corresponding real memory may be physically fragmented and may even overflow on to disk storage. A memory page table is a memory location that describes the mapping of the virtual addresses of multiple memory pages to its real addresses and is allocated for each new process. The memory page table consists of multiple page table entries. The entries may contain information indicating that the memory page is available in memory or not. When a process instance tries to access a memory page that is currently not available, a page fault exception is thrown. Page faults typically occur when a memory page is swapped out from memory to a hard disk. In this case, the page fault handler will read the requested memory page from the hard disk to the memory before granting access to the memory page. When all memory pages are marked as not available in the memory page table, the page fault handler will be called whenever a process tries to access a memory page.
The US patent application 2008/0109812 A1, incorporated herein by reference, discloses a method for managing access to shared resources in a multi-processor environment, while these processors are working in a physical parallelism. The access management is particularly useful for carrying out a control of the accesses to such resources, for example for shared memory, in order to stabilize or optimize the functioning of a process within a multi-task application using such a parallel environment.
The prior art record and replay technology imposes a strict sequence of accesses to memory locations in the secondary instance and lets the secondary instance replay the memory access events in the same way as the primary instance. It assumes that each program code instruction executed in the secondary instance references the resources it needs to access in a deterministic order. This assumption is true for all operating system architectures that support only program code instructions that reference at most one resource.
Some computer architectures support program code instructions that reference more than one memory location directly. In addition, such a specific program code instruction may access the multiple memory locations in a non-deterministic order. An example of such a computer architecture is IBM® System z®. The computer architecture of System z does not impose an exact sequence on the accesses of memory locations. This gives the hardware developer more freedom in optimizing the implementation of the specific program code instruction.
If a program code instruction accesses multiple memory locations on the secondary instance in a second order which may be different from a first order on the primary instance, the replay on the backup instance will fail using the methods described in the prior art.
It is thus an objective of the present invention to provide a method and a system for managing the access to resources shared among multiple processes within a computer system. Multiple instances of an application are almost simultaneously executed on multiple processors for fault tolerance. The present invention should support the recording of reservation events by the primary processor and the subsequent replay of the reservation events by a secondary processor for the access to the shared resources in the same order as recorded by the primary processor, where one program code instruction may request access to a set of shared resources in a non-deterministic order.