Background and Relevant Art
As computerized systems have increased in popularity, so have the needs to store and backup electronic files and other communications created by the users and applications associated therewith. In general, computer systems and related devices create files for a variety of reasons, such as in the general case of creating a word processing document in a work setting, as well as creating a file used for more sophisticated database purposes. In addition, many of these documents can include valuable work product, or sensitive information that should be protected.
One will appreciate, therefore, that there are a variety of reasons why an organization will want to backup electronic files on a regular basis, and thereby create a reliable restoration of an originally created file when needed. Generally, some of the challenges facing organizations implementing one or more such backup solutions relate to choices in a particular replication mechanism. That is, there are many ways (i.e., replication mechanisms) to copy data to be protected from a production server volume to a backup storage volume at a backup server, which is where the protected data would reside for recovery purposes. One can appreciate that each replication mechanism carries with it certain advantages and disadvantages.
For example, one conventional replication mechanism involves the production server logging the names of files that have changed on a volume to be protected, and then sending the entire, updated files to a backup volume at the backup server that corresponds to the volume to be protected at the production server. Another, similar mechanism for doing this is for the production server to not only log the name(s) of file(s) that have changed, but also compare the file(s) that have changed at the production server with any corresponding backup copy(ies) of the file(s) at the backup server, and then send to the backup server only the differential, changed bytes.
In particular, the latter mechanism can allow for faster monitoring in part since it may be done without use of a file system filter to monitor changes. Unfortunately, this replication mechanism may involve more resource overhead when comparing a prior copy of the file with an updated version. As such, both of these types of replication mechanism tend to be more effective with smaller files, or with large files that only have a set of the same bytes in a block of bytes that change frequently. Conversely, these replication mechanisms can be very inefficient for very large files, such as database files, particularly files that have sets of several bytes or byte blocks that change with relatively low frequency.
Another conventional replication mechanism involves identifying changes to files, rather than identifying only files that have changed. This mechanism of identifying changes to files typically relies on identifying files (e.g., names, types or locations) that are intended for replication, and identifying only the bytes that have changed in the file between administrator-defined time intervals in between replications. Thus, a backup agent (e.g., a “clone agent” in combination with a “file system filter” at the production server) logs only those changed bytes in the file, and ultimately communicates those changed bytes to the backup storage volume (i.e., “replica volume” on the storage medium). Unfortunately, this replication mechanism still tends to be more cost-effective from a resource expenditure standpoint for very large files or files that change infrequently between replication intervals, but less cost-effective for files that tend to change frequently or are entirely overwritten with each update.
Still another type of replication mechanism, which could be considered a hybrid in some respects of both of the above-discussed replication mechanisms, involves identifying files in terms of “byte blocks.” Generally, “byte blocks” comprise fixed size contiguous blocks of bytes, of which there can be many in any given file. For example, a production server (or “file server”) can identify files as sets of multiple blocks, where each block contains a plurality of bytes. If any of the bytes change within a given block (i.e., are updated, written to, etc.), the replication mechanism might flag the changed block, and send the entire block to the replica volume at an appropriate time. As such, the replica agent can spend only those resources that may be necessary to identify a changed block of bytes, rather than each changed byte in the file. This can allow a given server to avoid incurring additional overhead even though multiple changes may be made to the same byte block. Nevertheless, while this can provide the replication agent with some resource-expenditure advantages over the aforementioned mechanisms, this mechanism may still be better suited for larger files, such as database files, or files whose byte blocks are changed more than once within the same replication cycle.
Accordingly, an organization that is determining to use a particular replication mechanism for its backup service may need to weigh several considerations. Complicating this is the notion that, even though an organization may make a determination on its present file generation/change needs, such a consideration may nevertheless be inadequate in the future. For example, the organization's determination of a particular replication mechanism will typically be applied to all files to be protected, without regard to indicia that may make the determination more applicable for some files than for others, such as file type, size, location, or the like. Thus, the determination may be based on what the organization feels is best with its current environment, such as the set of most common file types, and/or commonly used applications.
Of course, if the predominant file type(s) and/or application types change(s) at a later point, then it is possible that the initially chosen replication mechanism may need to be replaced. This possibility can make it particularly difficult for the organization, both at the outset when trying to project what replication mechanism will be preferred, as well as at a later point from a resource expenditure perspective if or when needing to change. For example, the organization could insist that the bulk of applications used in the organization use a certain file type and/or application type that is suited to the chosen replication mechanisms, or alternatively commit itself to changing its replication mechanism periodically. Both of these scenarios, of course, can lead to significant cost and resource expenditure problems for the organization.