A database administrator's (DBAs) task is to administer and manage the health of the database environment that runs the business critical applications of the enterprise. This comprises ensuring the continued availability of database objects comprising the applications, and ensuring that the databases are well tuned to deliver the required performance expected of the business applications. For example, a database administrator is responsible for data backup in order to perform data recovery in the case of a system failure. Customers define the maximum time they can tolerate before the system is restored after a system failure. In many cases, the amount of time to recovery depends upon the technology used and the frequency of data backup.
From an application data availability perspective, the DBAs challenge is to deliver the required quality of service (QoS) for application data availability as demanded by the business application in the face of changes in the number of database objects, the size of the objects, and the volatility of the objects. In addition, DBAs should maintain the required QoS while dealing with changes to the hardware/software configurations, changes in the application workload, and potential changes to the QoS of the business application itself. Specifically, for each application's database and file objects, the DBA needs to use optimal technologies to perform the backup and recovery, determine the optimal backup frequency to conserve computing resources, and use the optimal backup and recovery strategy to deliver the required QoS.
Application data recovery is therefore a very skill-intensive requirement, resulting in increased total cost of ownership for an enterprise. This increased cost is due to several factors including non-optimal use of system resources. For example, DBAs tend to implement overcompensated strategies to avoid devising complex optimal backup schedules. Application data recovery can require manual monitoring and rescheduling of events as changes occur in the application objects, application workload, hardware, and software infrastructure. These complexities lead to many human errors in executing backup/recovery strategies that compromise the integrity of application data and fail to deliver the desired QoS.
A DBA typically determines the frequency of backup for the system based on worst case scenarios and the business' requirement for tolerable or acceptable downtime during recovery. Database data is not lost in the case of a failure; all updates to the database data are written to a log. To restore the system to a point of failure, the data is restored from the last backup and the restoration process rolls forward changes recorded in the logs since the last backup up to the point of failure.
Through this process, the database reads and applies all the incremental changes in the logs and the data is restored to the point of failure. If the backup is performed every seven days, the DBA most likely assumes that the worst case scenario point of failure occurs on the seventh day, before backup occurs. In this situation, the recovery time is the longest.
To meet a contracted quality of service (QoS) based on the customer's tolerance for downtime during recovery, the DBA may guarantee that the outage during which restoration occurs is less than the downtime allowed by the customer. Consequently, the time to restore the data from the last backup and roll forward incremental changes from the log should be less than the downtime allowed by the customer.
To determine an optimum backup approach and schedule, the DBA should analyze many aspects of the database and its environment, comprising the amount of data that may need to be restored, the machine on which the database operates, the operating system, the database type and version, etc. Given the amount of data, the DBA should determine if it is even possible to restore data in a worst case scenario and meet the QoS guarantees. Overall, the DBA should have a clear understanding of the operating environment, hardware, software, and capabilities. While this approach may yield an optimum backup approach and schedule, it is labor intensive and applies only to the initial state. All of these factors may change over time, necessitating a continuous refinement in the optimum backup approach and schedule.
Currently, a DBA determines the backup schedule manually. The DBA determines the amount of data to be backed up and how long the restore process may take. The DBA may, for example, determine that a backup may comprise 100 GB of data and the database is IBM DB2 with parallel recovery.
The DBA determines that restore from backup may take, for example, 5 minutes. The DBA then calculates the time required for roll forward. If the backup is performed every Monday, then the worst case scenario is if the point of failure is on the next Sunday. The more changes that have been made to the application's data, the longer it tends to take to restore the application's data. It may be, in this example, that it may take 15 minutes to perform roll forward. The total time required to restore the application would then be 20 minutes: 5 minutes for restore from backup and 15 minutes to perform roll forward. The customer may have contracted for a QoS guarantee of 10 minutes for a downtime limit. To ensure that QoS guarantees are met, the easiest option for the DBA would be to increase the frequency of backups, perhaps as often as daily. While this would ensure that the QoS guarantee is met, this is most likely not the most efficient use of resources.
A number of database and third party software vendors provide backup and recovery solutions at the database level, and some claim to offer data recovery at the application level as well. Almost all the vendors have backup and recovery offerings, provide assistance in generating the jobs with the relevant object names and syntax required to execute the backup and recovery functions and management tools that track the backups generated.
Complicating the issue of data recovery is the specification of application data availability. Business applications depend on data. Application data availability is key for continuous operations of the business. There needs to be a specification of application data availability at the application level, i.e., for all types of data involved in a business application. Furthermore, the specification should be in terms of business semantics at an application level (i.e., having a higher level of abstraction) rather than at the traditional individual data object level (which does not factor the impact on overall application availability particularly when the application comprises multiple data objects.)
The challenge is to define a set of business level metrics for applications availability that is then translated into domain specific business metrics. These business level metrics eventually drive the underlying allowable hardware and software information technology (IT) infrastructure to deliver the required business level objectives. Examples of domains other than availability comprise performance.
Specifically, from an availability domain perspective, an application's data (both databases & files) in turn should meet certain business objectives of availability and recovery of the application. Once such business-semantic specifications are defined, an enterprise or a service provider (xSP) has a consistent method of specifying its requirements for availability to deliver the required QoS, independent of a specific underlying infrastructure.
The conventional approach for application availability is missing a holistic view of all data stores (databases and files) of an application for data recovery that may span multiple eclectic systems. In addition, the ability to specify application data recovery requirements in a declarative fashion using business objectives/semantics does not currently exist. Furthermore, there currently is no mechanism for a systematic approach to map business objectives into an allowable set of technologies.
For an optimum backup approach and schedule, the QoS should be viewed as comprising the following parts:                Time to detection,        Time to decision, and        Execution of process.The conventional approach only addresses the time required to execute the restoration process. What is needed is a system that may, within the QoS limit, detect the failure and determine an optimum restoration plan in addition to executing the restoration process.        
The conventional approach for data recovery systems lacks a mechanism to translate business objectives for application data availability into an optimal backup and recovery strategy that is devised and executed to meet the desired QoS. In addition, these data recovery systems lack a mechanism for determining the optimal technologies to use for backup and recovery tasks. No mechanism is currently available to develop optimal schedules for backup. Further, no mechanism exists to determine optimal recovery strategies.
Furthermore, the conventional approach for data recovery systems lack a mechanism to adapt and refine all of the above in environments that have dynamically changing application workloads, business objectives, and hardware/software infrastructure technologies. Thus, there is a need for a data recovery system and method that automatically and dynamically optimize backup resources. The need for such system and method has heretofore remained unsatisfied.