In many computer installations, a secondary storage system maintains backups of datasets as they are modified or changed on a primary storage system. These secondary storage systems generally use hard-drives or other random access storage to keep the backed up datasets accessible and readily available. Tape and other sequentially accessed storage may also be used as part of the secondary storage system to maintain compatibility with existing tape backups and systems. These tape systems suffer from slower access times but are often included in a backup strategy for legacy and compatibility reasons.
Secondary storage systems from Network Appliance of Santa Clara Calif. provide disk-to-disk (DTD) type backups using their NearStore® products in conjunction with other solutions. In particular, the NearStore® products provide a storage solution for businesses that want to simplify and also automate their backups and restores using higher speed storage offered in the disk-to-disk technologies. High speed hard-drives used in the NearStore solutions allow reliable DTD backups to be made despite increasingly larger storage requirements and smaller time frames to complete. The secondary storage based on disk technologies is not only faster and more efficient but also more reliable as the backups can be verified and checked almost immediately or even in a real-time manner.
To accommodate enterprises using tape storage, Network Appliance further provides a NearStore® Virtual Tape Library (VTL) that operates and appears like a tape library system to a backup software application but provides superior speed and reliability of disk technologies. By emulating a variety of tape library solutions, large enterprise information technology (IT) departments can transition from slower and less reliable tape backup solutions to faster and more reliable DTD technologies. Secondary storage using VTL solutions allows insertion of the more desirable DTD solutions into enterprises implementing an array of heterogeneous tape library backup solutions and operating environments.
Backup strategies are also designed to increase the efficiency of performing backup and restore functions. A full backup of a filesystem or volume made at certain time intervals followed by incremental backups that take place between the full backups. Full backups are performed more infrequently than the incremental backup as they generally take a longer period of time to complete. The full backup typically copies all files in the filesystem or data in a data volume to the secondary storage thus creating a baseline for subsequent incremental backups. Snapshot-type backups are capable of flushing buffers and cache storage in the operating system, filesystem or application to ensure the data made during the full backup is as complete and up-to-date as possible.
Components performing incremental backups monitor the data changes in these filesystems and volumes and only backup the data changed after the full backup. In some cases, the incremental backups occurring at the filesystem level may copy complete files that have been modified after a full backup. More efficient incremental backup solutions only backup those blocks of data in a filesystem or volume modified after the full backup. In enterprises with large amounts of storage, the incremental backup operating at the block level can save a great deal of time as only a fraction of the data consisting of the changed blocks in the filesystem or volume need be moved from the primary storage to the secondary storage systems.
Unfortunately, the wide range of application servers and operating environments found in these enterprises do not store or use data in a uniform manner. For example, multiple application servers may all store application data locally on a primary storage system using multiple different proprietary data formats. Identifying changed blocks of data for incremental backups cannot be performed unless an application has a detailed understanding of each of these proprietary data formats. Operating environments used in the enterprise may also complicate matters and often include Unix, Linux, Apple OS and Windows. Applications may store data in these operating environments using filesystems, blocks, streaming data, through database applications and many other combinations therefore.
Indeed, some application servers may also include custom programmed backup clients to run on the application server and backup the data from the primary storage system to the secondary storage system. These backup clients, also referred to as software agents or “plug-ins”, must not only store but also retrieve the proprietary datasets from secondary storage on demand. For example, the conventional “plug-in” may be responsible for obtaining changed blocks from a dataset used by an application during a backup and moving the data to the secondary server. During a restore function, the “plug-in” obtains the data stored in the proprietary format from the secondary storage and ensures it is properly restored to primary storage in a format appropriate to the application.
These custom programmed backup clients may process the corresponding proprietary datasets but their monolithic design does not scale well in a complex enterprise environment. First, it takes a great deal of time to develop “plug-ins” for new applications as the “plug-ins” each must be capable of performing reliable backup and restore functionalities. This requires custom coding and extensive quality-assurance. Further, running multiple “plug-ins” may lead to contention for the same system resources and possible deadlock as individual backup clients may have overlapping functional requirements and cannot coordinate an efficient use of the system resources. For example, normal fluctuations in system resources, network bandwidth, available memory and processing power can introduce race conditions during a backup or restoration causing the agent or “plug-in” performing a backup or restore to fail. This in turn often requires restarting the backup or restore procedures as the dataset in the underlying application may have changed or been corrupted.
Increasing the reliability and scalability of data archiving provides a crucial edge to companies providing secondary storage systems. IT departments require that new secondary storage systems easily integrate within their heterogeneous operating and application environments with minimal development and integration risks. Development of customized software should also be possible with low or no costs and/or licensing requirements. To accommodate fast growing demand for secondary storage systems, the backup solutions should also be scalable and use resources efficiently. For example, the backup operations themselves should not be the cause of a server becoming overloaded or going down even during a peak-period of archiving data.