Data servers host critical production data in their storage systems. The storage systems are usually required to provide a level of data availability and service availability. Data and service are usually required to be resilient to a variety of failures, which could range from media failures to data center failures. Typically this requirement is addressed in part by a range of data protection schemes that may include tape-based backup of all or some of the production data.
In addition there is typically a need for other servers to concurrently access this same critical production data. These applications include data protection applications, site replication applications, search applications, discovery applications, analysis applications, and monitoring and supervision applications. This need has been addressed by a range of data management schemes, including setting up a specialized analysis server with a replica of the critical production data. Typical data protection and management schemes have some well known limitations. For example, in some cases, direct access to the server could result in instability and performance-affecting loads on the production servers. Other limitations are related to the serial and offline nature of traditional tape storage, which makes access to backed-up data time-consuming and inefficient.
Regardless of the type of backup storage used, some of the most significant limitations of conventional data protection and management stem from the characteristics of the dense data stored by the production system. FIG. 1 is a block diagram of a prior art system 100 that illustrates some these limitations. System 100 includes a production system and a utility system. The production system includes one or more production servers and production databases storing large amounts of dense production data. Dense data is typically stored and transferred in large quantities and is usually in a hard to read format that is not amenable to manipulation by applications or entities other than the production system or applications specifically designed to interface with the production system. Dense data is also referred to as bulk data. On the other hand, item data includes typically smaller data items in a variety of application formats. An example of item data is an Adobe Acrobat™ file or an email message, but there are many other examples. Item data is also referred to as brick data.
An example of a production system includes a messaging system such as Microsoft Exchange™. In the case of Exchange™, client applications that access Exchange™ servers through item interface application programming interfaces (APIs) include application programs (also referred to as applications) such as Outlook™. When a user wishes to access an item, such as an individual email, using Outlook™, the protocol used includes one of a messaging application programming interface protocol (MAPI protocol), Post Office Protocol version 3 (POP3), Internet message access protocol (IMAP or IMAP4), or others. This type of access is appropriate at the item level, but is extremely slow for accessing large numbers of items or performing transfer, search or audit functions directed to items stored in bulk (or in dense data format) on the productions server.
For performing backup functions, the production system includes a backup interface to the production server and database, as well as backup applications. The backup applications are used by a utility system to perform a bulk backup (also referred to as bulk transfer or bulk copy) of the entire production database file. The transferred production database file is stored on the utility system (or elsewhere, but typically off of the production system) as a bulk backup. In order to restore the production database file in the case of failure, the backup applications are used to transfer the bulk backup to the production server.
If data at the item level is required to be accessed from a bulk backup (for example to recover a particular “lost” email), the bulk backup must be transferred to the production server, or another location where the backed-up production database can be accessed using the item interface APIs. This is extremely slow, inefficient, and error prone.
To address the problem of access to back-up items, the convention utility system may also perform a separate brick backup by using the protocols previously mentioned to access the bulk production data through the item interface APIs (MAPI, SMTP, POP3, etc.). Because this is again very slow, it is typically done on a very limited basis. For example, only executives in an enterprise might have their messaging communications protected by brick backup. Brick backup involves accessing the production database directly using MAPI or SMTP, for example, to retrieve item data. This is a slow process that loads the server and may affect server performance. When an item in the brick backup is required to be accessed or restored, it is accessed using the item interface APIs and protocols previously listed.
Accessing the production database separately for both bulk backups and brick backups increases the load on the production system and may negatively impact performance. Also, maintaining two sets of backup data (bulk and brick) that are not reconciled is error-prone, and may not satisfy various compliance requirements. Further, in conventional systems, there is no mechanism for individual enterprise users to find and restore their own lost or deleted data. This increases workload for information technology (IT) personnel.
Conventional utility systems are not able to completely backup production data while at the same time allowing efficient access to that data in a usable (item) format. Some existing applications are designed specifically to perform functions such as auditing and legal discovery, but these typically read data off of the production system, negatively impacting its performance. Other existing applications painfully and slowly build archives of brick items extracted from the production database using MAPI or SMTP.
There is an increasingly urgent need for the ability to completely backup production data while at the same time allowing efficient access to that data in a usable (item) format. For example, enterprises are increasingly required to preserve and provide access to production data for auditing purposes, monitoring purposes, legal discover purposes, and other purposes.