1. Field of the Invention
The present invention relates generally to data management. More particularly, the present invention relates to data storage.
2. Background Art
The capacity to recover or restore data after a data loss event is a crucial aspect of data management. Data restoration capacity is typically related to the frequency and efficiency with which data is backed up. The frequency of data backups may be particularly important where data is often added to, removed from, or otherwise modified within a database, for example. Under those circumstances, the frequency with which data backups are performed may determine the extent to which a state may be fully restored after a disaster loss event. Thus, failure to perform regular data backups spaced by appropriate time intervals may result in substantial or even catastrophic irretrievable losses in the wake of a natural disaster or system failure.
Backup efficiency may take at least two forms relating to the time required to perform a data backup, or timing efficiency, and the manner in which data files are distributed over storage media, or storage efficiency. Timing efficiency, particularly where large amounts of data are routinely backed up, may become a limiting factor in determining the frequency with which data backups can by performed. As a result, timing inefficiency in the data backup process may compromise data restoration capacity.
Even where data backups are performed routinely and in a timely manner, however, the storage efficiency of those backups may influence the effectiveness with which data can be restored after a data loss event. For example, where data storage is efficient, so that backup data blocks are logically distributed across relatively few units of storage media, those data blocks may be readily accessed and recovered during data restoration. Where data storage is less efficient, however, and backup data blocks are widely distributed across numerous units of storage media, data restoration may be a time consuming and painstaking process, despite an otherwise adequate data backup procedure being in place. Consequently, inadequacies in either or both the timing efficiency and the storage efficiency of the data backup process may render their effectiveness in enabling data restoration less than optimal.
FIG. 1 shows a diagram of a conventional system for performing data backup in a typical storage area network (SAN) environment. Data management system 100, in FIG. 1, includes servers 112a, 112b, and 112c, SAN 110, and computer controlled tape library 130. SAN 10 may be utilized to mediate transfer of data, such as data folders 114, 116, 118, and 120, from servers 112a, 112b, and 112c, where the data is produced and/or modified, to computer controlled tape library 130, where the data can be backed up. Computer controlled tape library 130 is shown to include tape drives 134a, 134b, 134c, and 134d for performing data backup under the control of backup software 132. Computer controlled tape library 130 is also shown to include processor 138 controlling the operation of computer controlled tape library 130 and execution of backup software 132.
The conventional system of FIG. 1 is configured to perform data backup in what is typically referred to as a single stream mode. As may be seen from FIG. 1, each of data folders 114, 116, 118, and 120 is locked to a single tape drive and delivered to that recording device as respective single data streams 124, 126, 128, and 130. A substantial disadvantage of the conventional system of FIG. 1 is that the timing efficiency of the data backup process may be very poor. In the example shown in FIG. 1, for instance, there is considerable disparity in the sizes of the data folders. While data folders 114 and 118 are quite large, holding respectively three terabytes and four terabytes of data, data folders 116 and 120 are much smaller, holding respectively two hundred gigabytes and four hundred gigabytes of data. Such a distribution of data folder sizes may not be uncommon in computing environments used to produce television or animation content, for example.
Unfortunately, from the standpoint of timing efficiency, the single stream mode data backup performed by data management system 100 does not adequately account for those data folder size disparities. Tape drive 134b, dedicated to smallest data folder 116, may operate for a relatively short of period of time, for example, two hours, while tape drive 134c, dedicated to largest data folder 118, may operate for a vastly longer period, for example, approximately forty hours. As a result, the backup process, tied as it is to the time taken to backup largest data folder 118 may require forty hours to complete, during which tape drive 134c is fully utilized for the entire period. By contrast, tape drive 134a may operate for approximately seventy-five percent of the total backup period, while the data storage resources represented by tape drives 134b and 134d are much less fully utilized, resulting in a high degree of timing inefficiency.
FIG. 2 shows a diagram of another conventional system for performing data backup in a SAN environment. Data management system 200, in FIG. 2, is structurally very similar to data management system 100 in FIG. 1, and includes servers 212a, 212b, and 212c, SAN 210, and computer controlled tape library 230. Analogously to the situation in FIG. 1, in FIG. 2, SAN 210 may be utilized to mediate transfer of data folders 214 and 216 from servers 212a, 212b, and 212c, where they are produced and/or modified, to computer controlled tape library 230 for data backup. As in FIG. 1, computer controlled tape library 230, in FIG. 2, includes tape drives 234a, 234b, 234c, and 234d under the control of backup software 232. Computer controlled tape library 230 is also shown to include processor 238 controlling the operation of computer controlled tape library 230 and execution of backup software 232.
As may become apparent from comparison of FIGS. 1 and 2, the conventional system of FIG. 2 is configured to perform data backup in a multi-stream mode. Rather than locking individual data folders to individual tape drives, as in the single stream mode of FIG. 1, data folders 214 and 216 are algorithmically broken down into data blocks that are dispersed across tape drives 234a, 234b, 234c, and 234d in the multi-stream mode of FIG. 2. This may be done in an attempt to improve the timing efficiency of the backup process by utilizing all available tape drives more fully than is done in single stream mode. Backup software 232 typically breaks down the data from data folders 214 and 216 according to a predetermined algorithm coded into the software. As a result, the data contained in data folder 214 is broken down into multi-streams 224a, 224b, 224c, and 224d, delivered respectively to tape drives 234a, 234b, 234c, and 234d. Similarly, the contents of data folder 216 are delivered to tape drives 234a, 234b, 234c, and 234d, as respective multi-streams 226a, 226b, 226c, and 226d. 
A substantial disadvantage of the conventional system of FIG. 2 is that while timing efficiency may be improved, any improvement comes at the price of reduced storage efficiency, which may become very poor for system 200. For example, a common outcome of the approach shown in FIG. 2 is that data folders 214 and 216 are highly fragmented and dispersed over many units of data storage media during backup, e.g., in this case many storage tapes. In addition, it is frequently the case that individual tapes may be greatly underutilized, so that a storage tape having a capacity of four hundred gigabytes, for example, may have less than one gigabyte of relevant data stored on it. Consequently, because of the large number of units of storage media generated, and the dispersion of data fragments over that media, the approach shown by FIG. 2 may, while leading to faster backups, result in a challenging and time consuming data restoration process.
Accordingly, there is a need to overcome the drawbacks and deficiencies in the art by providing a solution that optimizes data backup by appropriately balancing timing efficiency and storage efficiency to facilitate data restoration in the aftermath of a loss event.