Many organizations, such as large businesses and governmental entities, have extremely large databases of information that must be readily available for rapid access and modification. These databases can contain terabytes of data and require large data storage systems containing multiple disk drives or arrays of disk drives organized into a single large logical memory system. Accordingly, the demand for uninterrupted access to data generates a need for software and hardware that can adequately store and protect data from events such as system failures, viruses, power outages, etc.
Further, in specific industries these needs are exacerbated. For example, in health service industries, a wide variety of data must be stored in databases. Given the potential for severe consequences, data associated with patient's health and care must be maintained at the highest integrity. Moreover, governmental regulations associated with health care, such as the Health Insurance Portability and Accountability Act (HIPAA), impose significant administrative burdens to health care data. In addition, due to confidentiality concerns, much of this data is often encrypted which adds to the complexity of managing the databases.
In view of these potential problems, it is important that data is reliably protected. This is typically accomplished through the use of software that backs up the data. As the amount of data continues to increase, backing up the data becomes more complex. Large amounts of data cannot currently be quickly transferred from one volume to a backup volume and taking a volume of data offline for backup purposes is an unattractive option, requiring sophisticated strategies to maximize the availability of the data.
Commercial utility programs are available for performing backup operations, often running on a backup server which communicates with database servers via a network. Although this architecture does remove a considerable amount of load from the main database servers, the dedicated backup server must still process and transfer the large volumes of data.
To improve performance, many backup and recovery applications (BURAs) utilize an image-based as opposed to a file-based approach. With a file-based backup process, individual files are simply selected and copied. However, files are written to whichever sectors of the disk or partition that happen to be open at the time to maximize disk storage. This can result in the data of each file being spread across the physical medium of the disk in a non-contiguous manner. Accordingly, reading and writing such files requires non-sequential disk access operations which increases the time required. Since large databases can have tens or hundreds of thousands of files, a file-based backup greatly magnifies the time penalty associated with these non-sequential operations.
In contrast, an image-based backup allows the data to be written sequentially as the goal is to replicate the entire partition or drive. Accordingly, this approach provides a significant improvement in terms of the time required to create an entire backup image.
Nevertheless, given the significant amount of time necessary to perform a backup of these large databases, it is desirable to minimize the number of times a full backup is performed. As can be appreciated, sometimes a portion of the information involved in a backup is unsuccessfully stored. Unfortunately, conventional utilities often require a complete rerun of the backup process for the entire host or group of hosts associated with the corrupted data. As discussed above, the size of the databases requires substantial system resources to perform backups and having to repeat a full backup places significant strain on the system.
What is therefore needed is a back up and recovery application that minimizes the amount of time needed to maintain accurate backup copies of large databases. What is further needed is a BURA capable of minimizing the amount of data copied when correcting an unsuccessful backup operation.