This invention relates to the field of file access in computer systems, and in particular to mechanisms for controlling the ownership of a file that is accessed by multiple users.
Multi-user or multiprocessing computer systems generally provide mechanisms for controlling and coordinating access from multiple hosts or users to any given file. If a first user or process is accessing a file and has write permission to that file, it is important to ensure that no other users or processes have write permission at the same time, or they may write changes alternately, destroying one another's previously written alterations, resulting in file corruption or incorrect data. In widely distributed systems, this problem can become quite difficult to control.
Many operating systems, such as UNIX, do not include file locking primitives, so a mechanism for creating and controlling file locks is necessary. This need is answered by a number of locking mechanisms currently in use, such as server processes through which clients communicate in order to coordinate locking of files.
Another approach used by current systems is to create a special lock file when a file is accessed; the existence of the lock file indicates to other users or processes that the file is in use, and that it may not be modified until the lock file is deleted, which occurs only when the user or process is finished with modifications and has closed the file. (In this application, references to user actions may be taken to refer likewise to process actions, and vice versa.) Thus, in such systems the lock file is automatically created by a predetermined process when a process first accesses a file, and is likewise automatically removed when the process access is complete.
In this type of system, other processes waiting for access to the file must wait first for the lock (i.e. the lock file) to be removed by the locking process, and until then will refrain from accessing the file (or data). This scheme has the advantage of being simple to implement, since it relies upon the inherent capabilities of the underlying distributed file system, i.e. does not require the creation and maintenance of dedicated servers, which can be difficult to keep running. (Such a scheme is detailed in most UNIX programming books, e.g. Advanced UNIX Programing by Marc J. Rochkind (Prentice Hall 1985), which is incorporated herein by reference.)
However, a problem with such a scheme is that a process may "die" unexpectedly while holding a lock file, in which case other processes attempting to access the locked file will wait indefinitely, because the triggering event for allowing them access to the file--the removal of the lock file--will not occur.
A conventional solution for this problem is to provide a time-out period, i.e. a predefined period of time constituting the longest amount of time that any process might be expected to require access to a given file,. e.g a required data file. If another process attempts to access the file and determines that there is a lock file, it determines the length of time that the lock has been in existence, and if this is longer than the time-out period then the lock is "orphaned" and the newly accessing process removes it, and proceeds to access the data file.
In a widely distributed environment having data files that are shared by a large number of processes, orphaned locks can be quite common, resulting from network problems, system crashes, etc. A problem with the use of a time-out scheme in such a setting is that it is difficult to decide upon a useful time-out period. The period must be set high enough so that the maximum reasonably expected access period for the data files will not be truncated by the time-out mechanism. This means that processes attempting to access a data file will need to wait this maximum period every time a currently accessing process dies, which leads to numerous long wait periods in a system with a high number of processes that may die, resulting in poor system performance.
However, if the maximum reasonable wait period is defined as being very short, then accesses by currently running processes may have their perfectly valid locks removed, resulting in corrupted data.
Hence, a system is needed that solves this difficult compromise of time-out periods for currently accessing processes, both providing lock protection for processes accessing data files and minimizing delays to processes attempting to access the data files when a currently accessing process dies.