Entities often generate and use data that is important in some way to their operations. This data can include, for example, business data, financial data, and personnel data. If this data were lost or compromised, the entity may realize significant adverse financial and other consequences. Accordingly, many entities have chosen to back up some or all of their data so that in the event of a natural disaster, unauthorized access, or other events, the entity can recover any data that was lost or compromised, and then restore that data to one or more locations, machines and/or environments.
While there is little question about the need to back up important data, the ongoing accumulation of multiple versions of one or more files or other objects can result in the storage of a vast amount of data, much of which may never even be accessed or used. This may not be a particularly significant problem, for example, at an individual file level where a user may hit ‘SAVE’ multiple times during the creation or editing of a document, but this ongoing accumulation of file or object versions can be a significant problem, for example, at an enterprise level.
In particular, saving specific versions of backed up files can be valuable, but preserving every possible version pushes the system towards inefficiency, chaos, and irrelevance. To illustrate, if a user has a database file that is continuously updating such that the database file gets backed up once every minute, 1440 backups will be created each day for that database file. Depending upon the retention policy, the user may keep 1440 backups per day, extending back over the entire retention period, and totaling over half a million versions in a year.
A further complication is the way in which a particular user or customer may choose to back up their data. For example, some backups may be performed continuously. That is, the backup system backs up changes as soon as the changes are perceived by the backup system. This continuous backup approach can lead to the rapid generation of large amounts of backed up data, particularly in an enterprise context.
As another example of data storage methodologies, objects such as files may be backed up independently of each other, or may be backed up as a group as part of a container backup. The former approach can be used where there is a need to maintain independence between files. Thus, for example, different files may be stored in different respective containers and, as such, backups can be performed at a relatively high level of granularity. However, management of the respective backups of multiple independent files may be significantly more complicated than management of a single container that contains multiple files. On the other hand, management of backed up data at a container level, for example, is relatively easy to implement but reduces flexibility since the files in the container are managed as a group, rather than individually.
Moreover, some data backup customers prefer to maintain data backups that go back in time for relatively long periods, such as a year or more. However, if all of the backups are retained, and if continuous backups are performed, it is possible that millions of versions of a single file may be retained. Thus, the amount of backed up data can grow rapidly, resulting in ever-increasing storage costs for the enterprise, often with little or no return on the investment. In particular, these costs are often not well spent since the majority of versions that are backed up may never be accessed or used.
One possible approach to gain a measure of control over the amount of data that is stored might involve the use of expiration times/dates for one or more of the backups. For example, a backup can be set to expire 30 days after creation of the backup. Thus, backups would be automatically deleted on an ongoing basis, based on their expiration date.
However, one problem with setting an expiration date for an object version based on the creation date of the object version relates to the number of versions that may exist for that object. In particular, while this approach has some attractiveness due to its simplicity, it fails to take into account the number of existing versions of that object. For example, if the backup that is set to expire in 30 days is the only backup of a file and/or is the most recent backup of that file, it makes little sense to delete that backup, notwithstanding that 30 days may have passed.
As the foregoing example illustrates, a file may be deleted too early because this approach fails to account for the number of backed up versions of the file. A related problem with such an approach is that a file may be deleted too late, for example, only after multiple additional backup versions, not all of which may be needed, have been created and stored.
In view of problems such as those noted above, and/or others, it would be useful to define and implement rules and policies to determine at what point in time a particular version of an object can be deleted from storage. As well, it would be useful to be able to define rules and policies which, when implemented, can automatically reduce the number of stored versions of an object as the current version of that object ages. Finally, it would be useful to be able to determine the point in time when a particular version of an object expires and can be deleted from backup, and such a determination is based not only on the time of creation of that particular version but also the time of creation of the next successive version.