Modern organizations generate and store large quantities of data. In many instances, organizations store much of their important data at a centralized data storage system. It is frequently important that such organizations be able to quickly access the data stored at the data storage system. In addition, it is frequently important that data stored at the data storage system be recoverable if the data is written to the data storage system incorrectly or if portions of the data stored at the repository is corrupted. Furthermore, it is important that data be able to be backed up to provide security in the event of device failure or other catastrophic event.
The large scale data centers managed by such organizations typically require mass data storage structures and storage area networks capable of providing both long-term mass data storage and access capabilities for application servers using that data. Some data security measures are usually implemented in such large data storage networks, and are intended to ensure proper data privacy and prevent data corruption. Typically, data security is accomplished via encryption of data and/or access control to a network within which the data is stored. Data can be stored in one or more locations, e.g. using a redundant array of inexpensive disks (RAID) or other techniques.
One example existing mass data storage system 10 is illustrated in FIG. 1. As shown, an application server 12 (e.g. a database or file system provider) connects to a number of storage devices 141-14N providing mass storage of data to be maintained accessible to the application server via direct connection 15, an IP-based network 16, and a Storage Area Network 18. Each of the storage devices 14 can host disks 20 of various types and configurations useable to store this data.
The physical disks 20 are made visible/accessible to the application server 12 by mapping those disks to addressable ports using, for example, logical unit numbering (LUN), internet SCSI (iSCSI), or common internet file system (CIFS) connection schemes. In the configuration shown, five disks are made available to the application server 12, bearing assigned letters I-M. Each of the assigned drive letters corresponds to a different physical disk 20 (or at least a different portion of a physical disk) connected to a storage device 14, and has a dedicated addressable port through which that disk 20 is accessible for storage and retrieval of data. Therefore, the application server 12 directly addresses data stored on the physical disks 20.
A second typical data storage arrangement 30 is shown in FIG. 2. The arrangement 30 illustrates a typical data backup configuration useable to tape-backup files stored in a data network. The network 30 includes an application server 32, which makes a snapshot of data 34 to send to a backup server 36. The backup server 36 stores the snapshot, and operates a tape management system 38 to record that snapshot to a magnetic tape 40 or other long-term storage device.
These data storage arrangements have a number of disadvantages. For example, in the network 10, a number of data access vulnerabilities exist. An unauthorized user can steal a physical disk 20, and thereby obtain access to sensitive files stored on that disk. Or, the unauthorized user can exploit network vulnerabilities to observe data stored on disks 20 by monitoring the data passing in any of the networks 15, 16, 18 between an authorized application server 12 or other authorized user and the physical disk 20. The network 10 also has inherent data loss risks. In the network 30, physical data storage can be time consuming, and physical backup tapes can be subject to failure, damage, or theft.
To overcome some of these advantages, systems have been introduced which duplicate and/or separate files and directories for storage across one or more physical disks. The files and directories are typically stored or backed up as a monolith, meaning that the files are logically grouped with other like data before being secured. Although this provides a convenient arrangement for retrieval, in that a common security construct (e.g. an encryption key or password) is related to all of the data, it also provides additional risk exposure if the data is compromised. Furthermore, similar data is typically stored encrypted with a common encryption key, thereby rendering the data vulnerable if the key is obtained.
For these and other reasons, improvements are desirable.