1. The Field of the Invention
The present invention relates to systems and methods for administration of archival storage and, more specifically, to systems and methods designed to allow for the long-term administration of information sent to backup or archival storage during multiple storage sessions.
2. The Prior State of the Art
Many advances have been made in computer hardware and software, but some general principles have remained constant. Although the cost of memory and data storage has continued to decrease, and the storage capacity of devices of a given size has continued to increase, there continues to be a difference in the cost of storing data depending on several factors, such as the medium used to store the data and the accessibility of the data. For example, it is generally more expensive to store a data word in cache memory than in system RAM. System RAM, in turn, is more expensive per storage word than magnetic disk storage. Magnetic disk storage is more expensive per storage word than archival storage. Thus, there continues to be motivation to move unused or less frequently used data to less expensive storage.
Another focus of the computer industry in recent years has been the development of reliable, fault-tolerant computer systems. These computer systems are designed to achieve a high degree of reliability and are often used in situations where computer failure would be costly. For example, certain computer systems in the banking and other industries support operations which must be available twenty-four hours a day, seven days a week. As part of these fault tolerant computers, a technique called journaling is often used to record various intermediate steps in certain transactions in order to create a record that allows rapid recovery from failure should one occur. These journaling systems are often adapted to allow a determination of the exact state of a particular operation when a failure occurred and resumption of the operation from that specific point. In addition, many systems that utilize journaling are also designed to recover from failure in a manner that preserves as much data and information as possible.
Systems where journaling or logging is used for various purposes tend to create files that grow in size. If all data sent to a journal or a log file was maintained in perpetuity, the size of the journal or log file would grow without bound. In order to limit the size of journal or log files, some systems allocate a fixed maximum size to the journal or log file. When the journal or log file approaches its maximum size, any new data placed in the file will overwrite the oldest data in the file. Such an approach maintains a rolling buffer of a fixed size that extends a certain time into the past. If sufficient space is allocated, many systems are capable of storing sufficient data to achieve the purpose of the journaling or log file.
In some situations, however, it is desirable to maintain a complete log file for a time that is longer than the amount of data that can be allocated to the journal or log file. In such situations, it is possible to take the oldest data in the journal or log file and send the data to archival storage. Such an approach reduces the amount of local storage space that is needed to store journal or log data, while, simultaneously, allowing a more complete history to be maintained. Such an approach also lowers the cost of maintaining journal or log files. This type of an approach, of necessity, leads to a situation where pieces of the journal or log file are sent to archival storage at different times. For example, during one archive storage session a certain percentage or portion of the log file is transferred to archival storage. After a certain period of time, another archive storage session may be initiated where additional data from the journal or log file is sent to archival storage.
If it ever becomes necessary to reconstruct the file, the various sessions must be loaded, one after the other in proper sequence, and the information stored in each session must be retrieved. Such a process is generally highly time consuming since each session of interest must be located, the media positioned to read the appropriate information from the session, and then the next session must be located. The process is repeated until all appropriate information has been retrieved from the various archive storage sessions and assembled in the proper order. The time to accomplish this task is often lengthy because archival storage tends to have a much slower access time than local storage. It would. therefore, be desirable to develop a system which could eliminate the lengthy retrieval time while still allowing data from a file to be stored in multiple archival sessions at different times.
The problem described above is not limited to journaling or logging systems. Normal operations of a system presents similar challenges. For example, some operations utilize large databases. The information in such a database is often highly critical to continued operations of an entity, and it is important to protect the data in the database from loss. To guard against loss, backup or archive copies of the database are often made and preserved in a manner which will allow important information to be recovered should a failure occur. Due to the size of the database, however, making such a backup can be a rather lengthy process. In order to reduce the time it takes to make a backup, many systems start with an initial or full backup and then simply make differential or incremental backups which send only the information changed since the last backup to backup or archival storage. Such practices create a situation where different portions of the same file are sent to backup or archival storage in different sessions at different times. If it becomes necessary to restore the database, the initial backup must be restored first, followed by each differential or incremental backup that has occurred afterward, each in its proper order. Only by following this procedure can the database be restored to its proper state. Depending upon the number of differential backups and the access time of the backup or archival media, such an operation can take a significant amount of time. These situations would also benefit from a mechanism that reduces the overall time necessary to restore the database to its proper state.
Many other situations exist where various portions of various files are sent to backup or archival storage over a period of time in different archive storage sessions. All these situations may benefit from a system or method which reduces the time necessary to restore information to its proper state if a failure should occur.