Background and Relevant Art
As computerized systems have increased in popularity, so have the needs to store and back up electronic files and other communications created by the users and applications associated therewith. In general, computer systems and related devices create files for a variety of reasons, such as in the general case of creating a word processing document in a work setting, as well as creating data used for more sophisticated database purposes. In addition, many of these documents can include valuable work product, or sensitive information that should be protected. One will appreciate, therefore, that there are a variety of reasons why an organization will want to backup electronic files on a regular basis, and thereby create a reliable restoration of an originally created file when needed.
Unfortunately, it is not a simple matter to just backup files, especially in larger organizations. In particular, not all files are created equally. For example, word processing and spreadsheet files tend to be relatively small, and thus require relatively little storage space. By contrast, certain types of media files, such as image files, video or music files, or the like, tend to be much larger, and accordingly tend to take up much larger storage space. In addition, some files are managed individually, while other files are managed only within the context of particular applications. For example, certain types of database data might otherwise appear as a “glob” of unintelligible information without the benefit of the database application to interpret the glob.
As a result, it is fairly common for organizations to implement policies that limit the types and amount of data that particular users can store on any given production server (e.g., mail, file, or database server, etc.) For example, the organization might allow users to store any type of word processing or spreadsheet files on a production server, but only allow the users to create and/or store work-related media files on a production server (or not at all). Along similar lines, the organization might even desire to limit users to specific quotas on certain file types.
Unfortunately, policies such as these can be fairly difficult to enforce with meaningful efficiency. For example, an administrator might desire to determine how much production server space a user is using for limited (or otherwise prohibited) types of media files. As such, the administrator might periodically search for all files with a particular media extension, such as “.au,” “.mp3,” “.wma,” “.wmv,” “.aif,” “.aac,” “.avi,” “.divx,” “.mov,” “.ra,” “.ram,” or “.wav.” The administrator might thus determine that the user is not using production server space for such files if no files with those extensions are found. The user, however, may have simply changed the file extensions on those certain media files, and thus configured them to avoid detection by the administrator's filter.
An organization might work around this sort of thing with one or more mechanisms at the production server that monitor file contents, rather than just file extensions. Furthermore, the production server might be configured with one or more mechanisms to perform real-time monitoring of specific writes and deletions made by a user. For example, the production server might identify each time a write or deletion is made on a production server volume, associate that write/deletion with a particular user, and then identify whether the write is for a particular type of limited (or otherwise prohibited) file type. The logic might then need to tally whether this particular write or deletion positions the user within or outside of a particular quota, and then make real-time adjustments to the user's privileges. One way this could be done is by indexing rich information about each file on one or more production server volumes, and then periodically scan the indexes to identify whether the user-directed file writes (or deletions) fit within a particular user quota. Ranges that appear to exceed a particular quota can then be reported to the production server administrator, who can then make corresponding adjustments to the user's privileges.
Unfortunately, one can appreciate that these types of workarounds can be computationally expensive. For example, it can be expensive to simply identify that a particular write is associated with a particular user, much less identify in real-time what that write or deletion contains. In addition, it can be computationally expensive to sum up data deletions within the context of a particular quota in real-time, and then make real-time prohibitions or allowances to a user account. Furthermore, analyzing larger file writes (e.g., certain media files) can consume more computational resources than analyzing smaller file writes. Still further, analyzing writes and deletions within a database can often require additional interfacing through the particular database application. These and other complications can be further exacerbated when trying to maintain consistency for a user's data access privileges, especially when considering multiple volumes on potentially multiple production servers in the organization.
By contrast, a backup server is generally not equipped to handle this sort of processing. For example, conventional backup servers typically operate as flat file servers, and may not, therefore, have installed the appropriate context for understanding certain types of application data, such as database application data. Rather, the backup server may simply receive backup data and return backup data corresponding to specific backup events. Such configurations can encumber the discoverability of specific data forms at a given backup server, for example, users that want to retrieve specific data backups are generally limited to requesting data saved or created within a specific period, such as files created and/or backed up “between dates x and y.” Only once provided the data corresponding to the desired range are the users able to search for specific files within the recovered.
As such, there are a number of difficulties associated with data accessibility and management between production servers and backup systems that can be addressed.