1. Field of the Invention
The invention generally relates to data storage within a storage area network, and more particularly to a read/write object storage system that manages data storage based on the information content across data blocks within the system.
2. Description of the Related Art
Within this application several publications are referenced by arabic numerals within parentheses. Full citations for these, and other publications may be found at the end of the specification immediately preceding the claims. The disclosures of all these publications in their entireties are hereby expressly incorporated by reference into the present application for the purposes of indicating the background of the present invention and illustrating the state of the art.
There are many copies of the same data in the world. For example, many PC users have duplicate copies of the same application, program, and data installed and stored on their computer. In addition, when email and attachments are forwarded, different users end up storing the same email and attachments. As computing and storage becomes more centralized, servers increasingly store the same data for many different users/organizations. Furthermore, many critical applications such as snapshot, time travel, data archival etc. require the system to maintain multiple copies of largely identical data. Although the different copies of a data object (e.g., file) may not be identical, many of the data blocks that make up the object are identical. Therefore, storage systems that blindly manage identical blocks would increasingly waste storage space and incur unnecessary cost to maintain the extra storage. An example of a typical file storage system is illustrated in FIG. 1, wherein the system contains a storage component 2 connected to a file allocation table (FAT) 3, which maps the name of a file (or file ID) to the locations (addresses) of the storage blocks belonging to that file. A free space map (FSM) 4 keeps track of the unallocated storage blocks, i.e., the storage blocks that do not belong to any file. Finally, read/write/create/delete operations 5 are used in conjunction with the system to manipulate the stored data.
One way to reduce the amount of storage consumed is to compress the data using an algorithm such as that described in Lempel-Ziv1. However, this conventional approach introduces additional complexity as the data objects and/or data blocks become variably sized. Furthermore, performance and reliability suffers as data has to be compressed when written and uncompressed when read. Moreover, such compression has limited effectiveness at reducing the storage taken up by duplicate data blocks because it is local in scope, exploiting only similar bit patterns that are near one another.
Another conventional technique that reduces storage consumption is copy-on-write. When a snapshot of a file system is taken with copy-on-write, no data blocks are actually copied. It is only when a block is updated that a copy is made of the block. The assumption here is that when a block is updated, it is different from the original block. However, copy-on-write techniques only reduce the likelihood of having multiple copies of the same data block for the same offset in different versions of a file.
There has also been some work on a write-once block storage system that identifies data blocks by a unique hash of the data2. The drawbacks with such a system include: (1) it does not allow data to be deleted; (2) it requires a new interface to identify data blocks by a hash value rather than the address where they are stored; (3) it offers poor performance because data blocks that are logically related (e.g., of the same file) may not be stored together; (4) it incurs substantial inline overhead in computing the hash; and (5) it is less reliable in the sense that the loss of a given block could impact a lot of user data because a block of data that occurs many times is stored only once.
An orthogonal approach to reducing the cost of storing data is to use less expensive storage devices. For example, this can be achieved with a hierarchical storage manager that pushes some of the less active (predicted) data onto cheaper and slower storage (e.g., tapes). However, such an approach entails a lot of complexity, and is cost-effective only on a large scale. Moreover, performance is very poor when the prediction of which data is less likely to be used is wrong, as can often be the case.
Another drawback of existing storage systems is that although they may contain many copies of the same data, they have no idea exactly how many copies there are or where all the copies are located. Therefore, when a given copy is lost due to hardware problems (e.g., media failure, which is especially a problem when low-cost desktop drives are used), the system may not be able to repair that copy even though there are many other identical copies of the data located elsewhere in the system. Moreover, conventional solutions rely on maintaining redundancy of all the blocks, which is very costly, rather than controlled redundancy of the actual data, which is what is really needed.
Therefore, there remains a need for a novel system and method to reliably store data in a compact and inexpensive manner, which allows the data to be accessed quickly, and which does not store unnecessary duplicates of the same data within the system.