A distributed data storage system typically comprises cache memories that are coupled to a number of disks wherein the data is permanently stored. The disks may be in the same general location, or be in completely different locations. Similarly, the caches may be localized or distributed. The storage system is normally used by one or more hosts external to the system.
Using more than one cache and more than one disk leads to a number of very practical advantages, such as protection against complete system failure if one of the caches or one of the disks malfunctions. Redundancy may be incorporated into a multiple cache or multiple disk system, so that failure of a cache or a disk in the distributed storage system is not apparent to one of the external hosts, and has little effect on the functioning of the system.
While distribution of the storage elements has undoubted advantages, the fact of the distribution typically leads to increased overhead compared to a local system having a single cache and a single disk. Inter alia, the increased overhead is required to manage the increased number of system components, to equalize or attempt to equalize usage of the components, to maintain redundancy among the components, to operate a backup system in the case of a failure of one of the components, and to manage addition of components to, or removal of components from, the system. A reduction in the required overhead for a distributed storage system is desirable.
An article titled “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web,” by Karger et al., in the Proceedings of the 29th ACM Symposium on Theory of Computing, pages 654-663, (May 1997), whose disclosure is incorporated herein by reference, describes caching protocols for relieving “hot spots” in distributed networks. The article describes a hashing technique known as consistent hashing, and the use of a consistent hashing function. Such a function allocates objects to devices so as to spread the objects evenly over the devices, so that there is a minimal redistribution of objects if there is a change in the devices, and so that the allocation is consistent, i.e., is reproducible. The article applies a consistent hashing function to read-only cache systems, i.e., systems where a client may only read data from the cache system, not write data to the system, in order to distribute input/output requests to the systems. A read-only cache system is used in much of the World Wide Web, where a typical user is only able to read from sites on the Web having such a system, not write to such sites.
An article titled “Differentiated Object Placement and Location for Self-Organizing Storage Clusters,” by Tang et al., in Technical Report 2002-32 of the University of California, Santa Barbara (November, 2002), whose disclosure is incorporated herein by reference, describes a protocol for managing a storage system where components are added or removed from the system. The protocol uses a consistent hashing scheme for placement of small objects in the system. Large objects are placed in the system according to a usage-based policy.
An article titled “Compact, Adaptive Placement Schemes for Non-Uniform Capacities,” by Brinkmann et al., in the August, 2002, Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA), whose disclosure is incorporated herein by reference, describes two strategies for distributing objects among a heterogeneous set of servers. Both strategies are based on hashing systems.
U.S. Pat. No. 5,875,481 to Ashton, et al., whose disclosure is incorporated herein by reference, describes a method for dynamic reconfiguration of data storage devices. The method assigns a selected number of the data storage devices as input devices and a selected number of the data storage devices as output devices in a predetermined input/output ratio, so as to improve data transfer efficiency of the storage devices.
U.S. Pat. No. 6,317,815 to Mayer, et al., whose disclosure is incorporated herein by reference, describes a method and apparatus for reformatting a main storage device of a computer system. The main storage device is reformatted by making use of a secondary storage device on which is stored a copy of the data stored on the main device.
U.S. Pat. No. 6,434,666 to Takahashi, et al., whose disclosure is incorporated herein by reference, describes a memory control apparatus. The apparatus is interposed between a central processing unit (processor) and a memory device that stores data. The apparatus has a plurality of cache memories to temporarily store data which is transferred between the processor and the memory device, and a cache memory control unit which selects the cache memory used to store the data being transferred.
U.S. Pat. No. 6,453,404 to Bereznyi, et al., whose disclosure is incorporated herein by reference, describes a cache system that allocates memory for storage of data items by defining a series of small blocks that are uniform in size. The cache system, rather than an operating system, assigns one or more blocks for storage of a data item.
A number of different types of storage system are known in the art. In a storage area network (SAN) data is accessed in blocks at a device level, and the data is transferred in blocks. Typically, the basic unit of data organization is a logical unit (LU) which consists of a sequence of logical block addresses (LBAs).
In a network attached storage (NAS) system, data is accessed as file data or file meta-data (parameters of the file). The basic unit of organization is typically a file.
In an object storage architecture (OSA), the basic unit of storage is a storage object, which comprises file data together with meta-data. The latter comprise storage attributes such as data layout and usage information.
Content addressed storage (CAS) is a particular case of OSA, designed for data that is intended to be stored and not changed. CAS assigns a unique identifier to the stored data, the identifier depending on the contents of the data.