1. Field of the Invention
This invention relates to data storage. More particularly, this invention relates to aggregate reduction of data access latency in distributed data storage entities.
2. Description of the Related Art
Data storage systems generally store data on physical media in a manner that is transparent to host computers. From the perspective of a host computer, data is stored at logical addresses located on file systems, or logical volumes of the storage system. To function, data storage systems map the logical addresses to addressable physical locations on storage media, such as direct access hard disks. In distributed systems, requests for data accessed may be queued with other requests in one or more queues. Many queueing strategies are known in the art.
The slow access time, of the order of 5-10 ms, for an input/output (I/O) transaction performed on a disk has led to the need for a caching system between a host generating the I/O transaction and the disk. A cache, a fast access time medium, stores a portion of the data contained in the disk. The I/O transaction is first routed to the cache, and if the data required by the transaction exists in the cache, it may be used without accessing the disk.
Using more than one cache and more than one disk can improve access time, and leads to a number of very practical advantages, such as protection against complete system failure if one of the caches or one of the disks malfunctions. Redundancy may be incorporated into a multiple cache or multiple disk system, so that failure of a cache or a disk in the distributed storage system is not apparent to one of the external hosts, and has little effect on the functioning of the system. U.S. Pat. No. 6,457,102; issued to Lambright, et al., whose disclosure is incorporated herein by reference, describes a system for storing data in a cache memory that is divided into a number of separate portions. Exclusive access to each of the portions is provided by software or hardware locks. The system may be used for choosing which data is to be erased from the cache in order to make room for new data.
A data storage system is typically set up to be as evenly loaded as possible, in terms of activity performed by the system elements. Such load balancing enhances the ability of the data storage system to perform efficiently. Methods are known in the art for effecting and maintaining load balancing. An article titled Compact, Adaptive Placement Schemed for Non-Uniform Capacities, by Brinkmann et. al., in the Aug., 2002, Proceedings of the 14th ACM Symposium, on Parallel Algorithms and Architectures (SPAA); whose disclosure is incorporated herein by reference, describes two strategies for distributing objects among a heterogeneous set of servers. Both strategies are based on hashing systems.
Using more than one cache and more than one disk can improve access time, and leads to a number of very practical advantages, such as protection against complete system failure if one of the caches or one of the disks malfunctions. Redundancy may be incorporated into a multiple cache or multiple disk system, so that failure of a cache or a disk in the distributed storage system is not apparent to one of the external hosts, and has little effect on the functioning of the system. U.S. Pat. No. 6,457,102; issued to Lambright, et al., whose disclosure is incorporated herein by reference, describes a system for storing data in a cache memory that is divided into a number of separate portions. Exclusive access to each of the portions is provided by software or hardware locks. The system may be used for choosing which data is to be erased from the cache in order to make room for new data.
A data storage system is typically set up to be as evenly loaded as possible, in terms of activity performed by the system elements. Such load balancing enhances the ability of the data storage system to perform efficiently. Methods are known in the art for effecting and maintaining load balancing. An article titled Compact, Adaptive Placement Schemed for Non-Uniform Capacities, by Brinikmann et. al., in the August, 2002, Proceedings of the 14th ACM Symposium, on Parallel Algorithms and Architectures (SPAA); whose disclosure is incorporated herein by reference, describes two strategies for distributing objects among a heterogeneous set of servers. Both strategies are based on hashing systems.