The invention relates in general to a method for calculating a recovery time of an application system in a computer system for tuning the computer system dynamically so an agreed recovery time can be secured.
The invention further refers to an appropriate computing system, a computer program and a computer program product.
The use of and dependency on data in today's society is rapidly expanding. Now more than ever, businesses continuously rely on data in order to operate. Businesses and their customers demand that the data be available and accurate. Those data may originate from different areas. The main part of those data within a company are stored within a so-called database management system. Such a database management system serves to store and manage large amounts of data.
Over time in a typical computer environment, large amounts of data are typically written to and retrieved from storage devices connected to the computer. As more data are exchanged with the storage devices, it becomes increasingly difficult for the data owner to reproduce these data if the storage devices fail. Internal influences can lead to a breakdown of data carrier or of processors. A software mistake, mostly based on a bad design can also occur.
The consequences of data loss can be fatal for a business company, resulting in an economic damage. Therefore, regarding data storages, it is common practice to generate a copy of said data which can be restored on demand.
So one of the most important aspects within a database management system is the protection of one's organization's data from logical errors, disasters and other failures by storing backup and archive copies of data on offline storage. A so-called backup describes generally the step of copying data within a computer system on a storage medium as well as the copy itself. Doing a regular backup alone is no guaranteed protection against data loss since there may be internal influences to the backup system which make the backup invalid.
An execution of such a backup can result in utilisation of a large number of resources for a long period of time because of the large amount of data to be stored. Nevertheless, it is very important to execute backups regularly, for that several versions of backups are available in case of a restore. The so-called recovery of data provides the database after restoring with complete functionality, so that all data of the database are available without restrictions.
One way of protecting data is by backing up the data to backup media, e.g., tapes or disks. Such backup is typically performed manually or automatically at preset intervals using backup software. The backup media are then stored away in a safe location. Various conventional mechanisms for protecting and recovering data are available for businesses.
The so-called backup systems vary in the levels of protection they provide, the amount of time required to restore the backed up data and the difficulty associated with their integration with the businesses' other systems and applications.
Generally, the success of these mechanisms is measured in terms of “data availability” i.e., how quickly a system, a database, or a file can be restored after a failure or corruption of data. In the following any system which can be an object of a backup and restoring process, respectively, will be subsumed under the term “application system”.
There are mainly two types of backup procedures and systems available.
One type of backup can be referred to as an “offline” backup. In an offline backup, an application system that is being backed up has to be quiesced and cannot be used during the backup process since it is “offline” for users. Moreover, users may be unable to access the files during a full system backup. Accordingly, the cost of performing such backups is greater in terms of user productivity and/or system resources.
FIG. 1 illustrates such an offline backup. At a point in time t1 an application system DB will be shutdown or set offline. Now for a period of time tb1 the movement of actual data objects takes place from the application system DB to a backup storage TSM as indicated by reference number 1. After completion of this backup process the application system DB will be set online or start up again at a point in time t2. From there on logs of the application activity, so-called redologs 2 will be written by the application system DB and saved to the backup storage system TSM until the next backup process starts or the application goes offline at a point in time t3.
Another type of backup can be referred to as an “online” backup which is illustrated in FIG. 2. In an online backup, an application system DB that is being backed up is placed in a different mode at a point in time t1, called “online backup mode”, and stays in this mode during the backup process, namely within a period of time tb11. The mechanics of this online backup mode or hot backup mode is proprietary to a specific application. The similarity for all application systems which are regarded within the scope of the present invention is the creation of more detailed log information describing application activity relevant to the data repository to enable later restoration of this repository during recovery. This additional log information is made persistent within the redologs of the application system. Typically, a backup process performed during the period of time tb11 performs a full system backup every time the files are backed up as indicated by reference number 1. A full system backup ensures that a complete and consistent set of data objects on the application system DB is copied to a secondary or redundant storage, namely a backup storage system TSM. In case of an online backup all redologs produced during the online backup mode tb11, need to be saved by the backup storage system TSM as indicated by reference number 2. After completion of this backup process the application system DB will be set in normal operation mode at a point in time t2. Since there may be open transactions within the application system DB at the end of the backup process at point in time t2 the latest redologs 3 need to be saved in addition in a time period tb12 when all transactions are closed which where open during tb11. This means a complete consistent set of application data exists within the backup repository not before the point in time t3 when this “delayed” redologs are saved. That means that a complete backup takes a period in time tb1 corresponding to the sum tb11+tb12. An indicated period in time tb2 shows that afterwards logs of the application activity, so-called redologs 4 will be written by the application system DB and saved to the backup storage system TSM until the next backup process starts or the application system DB goes offline at a point in time t4.
A backup process can be established to backup data on a regular or periodic basis (e.g., daily, nightly, weekly, etc.).
However, as present business applications run virtually around the clock with little tolerance for any downtime, the time frame or window for backing up data is small if it exists. Recovering data often requires the application of a database to restore and recover logs of data. Generally, a log file is a list of actions that have occurred for the purpose of analysis at a later time, for diagnostic or measurement purposes. It is possible to maintain a temporary log of data transactions since the last save of data. When a user saves data to the database, the temporary log is wiped out. Normally, log files only contain forward information, thereby limiting the use and effectiveness of the log files in restoring information. Within the context of restoring and within the following description log files will be referred to as redologs. By definition, restoration is to a point in the past. The fact that redologs can only move information forward through time implies that they must be used in conjunction with some other forms of data restoration, such as restoring an offline full backup, in order to achieve a restoration to a point in the past. Restoration proceeds by overwriting the data with stored copies and by undoing the changes to the redologs. According to such a procedure it is very difficult to preview a specific time frame or window in which a recovery can be done.
Restoring of data corresponds to replace data of the so-called production computer with data of the backup stored on a backup storage system. Therefore, it is very important, that the backup is precisely done, because otherwise wrong data are brought in during restoring. In the worst case, the database is unusable after termination of the restoring.
There are different possibilities to proceed a backup.
A user executes a regular backup under optimised use of his resources. The backup of a database for example is executed according to the following steps. The data of the database are first copied. With respect to the used procedure, changes are saved during or after the backup.
If recovery is decided, the point in time at which the recovery has to be executed has to be determined. All data which have been deposited in the database until this point in time have to be restored. After restoring, the so-called recovery can be started, so that the complete functionality of the database is re-established.
The recovery time frame cannot be exactly estimated. Providers of a recovery service have no possibility to maintain predefined recovery times, because prediction of the time frame is hardly possible. Within the scope of the present invention the term “recovery time” covers the whole period of time necessary to restore backed up data and corresponding redologs as well as to recover those restored data with the associated redologs.