Computer systems store information in files. Sometimes, a file may need to be accessed, for example by reading it to duplicate it for archival or other purposes. The process of reading the file takes a finite amount of time. During this time, if other computer processes are actively updating the file, it is possible for a process to write to the file while file is being read. This can result in inconsistent data in the copy.
For example, after the first half of the file is read, assume another process writes data to the first half of the file already read and to the second half of the file not yet read. While copying continues, the new data written into the second half of the file will be read and copied, but the new data written into the first half of the file will not the read and copied. As a result, the copy of the file will have inconsistent data.
One possible solution to this problem is to shut down or suspend the processes that may write new information into the file during the period of time that the file is being copied. However, this possible solution may be unacceptable, because the interruption of these processes may inconvenience the users of the processes.
If the processes that update the file do not write to the file frequently, a different potential solution may be employed. The potential solution can detect the presence of inconsistent data in the file being read so that the copying process can be restarted. The file to be read may be arranged as blocks of data, with each block having its own unique block number. A storage area in the file to be read stores the highest current block number in use. If a new block is to be written to the file, the highest current block number is retrieved from storage and incremented. This incremented block number is then written back to the file as the new highest current block number in use, and the new block is written associated with this new block number. For example, the new block number may be written into a specific location within the block.
When the file is to be copied, the process that will read the file to be copied retrieves the current highest block number from storage in the file. The blocks in the file are then read from the file and written into the copy. During or after the copying process, the block numbers of each block read are compared to the highest current block number read before copying process began. If any block read has a number that exceeds the highest current block number stored before the copying process began, a new block has been written to the file after the copying process began, and the new block was read and written into the copy of the file, resulting in inconsistent data in the copy of the file. The copying process is therefore restarted to avoid the problem of inconsistent data in the file. This process may be repeated again and again if necessary until a complete copy is made without any inconsistent data.
While this solution prevents the problem of reading inconsistent data from a file, other problems are caused by the solution. First, re-copying the file may impact the performance of other processes that share the resources used to recopy the file. Re-copying impacts the performance of the storage device from which the file is read, the storage device onto which the copy of the file is written, and the processor copying the file. If the processes writing to the file write more than occasionally, the number of times copying is restarted can be substantial, significantly impacting the performance of the various storage devices and the processor. In addition, the copying process may be somewhat time sensitive. In such case, it may be desirable to complete the copying process as soon as possible after it begins.
As an example of a somewhat time-sensitive copying process, a user may desire backup copies of ten files. It may be desirable to ensure that all files are copied as near in time as possible. If some of the files are often written by other processes, it may be quite difficult to obtain a copy of all 10 files made the same, or approximately the same, time if the second potential solution is employed.
A system and apparatus are therefore needed that can allow a file that is being written to by other processes to be accessed without containing inconsistent data, while reducing the number of times the file must be re-accessed.