1. Field of the Invention
The present invention generally relates to data storage systems and methods, and, more particularly, to a methodology for distributing failure-induced work load among a plurality of backup managers using a canonical name-based manager-naming scheme.
2. Description of Related Art
With increasing reliance on electronic means of data communication, different models to efficiently and economically store a large amount of data have been proposed. A data storage mechanism requires not only a sufficient amount of physical disk space to store data, but various levels of fault tolerance or redundancy (depending on how critical the data is) to preserve data integrity in the event of one or more disk failures. The availability of fault-tolerance is almost mandatory in modern high-end data storage systems. One group of schemes for fault tolerant data storage includes the well-known RAID (Redundant Array of Independent Disks) levels or configurations. A number of RAID levels (e.g., RAID-0, RAID-1, RAID-3, RAID-4, RAID-5, etc.) are designed to provide fault tolerance and redundancy for different data storage applications. A data file in a RAID environment may be stored in any one of the RAID configurations depending on how critical the content of the data file is vis-à-vis how much physical disk space is affordable to provide redundancy or backup in the event of a disk failure.
Another method of fault tolerance in existing storage systems is the use of a clustering approach. In a clustering environment, two servers are bound together (i.e., electronically linked as a pair) and one server takes over the full workload of the other server should the other one fail. The “backup” server in the server pair typically does not serve data processing requests so long as the other “primary” server is operating in the fault-free state. Rather, the backup server just keeps its state up-to-date (i.e., the backup server maintains its state synchronized with the most recent state of the primary server) so that it can take over should the primary fail.
In the above described clustering approach, the available processing power on the backup server is wasted during the fault-free state because the backup server does not actively perform data processing as long as the primary server is fault free. All the backup server does in the clustering configuration is to maintain the primary server's state replicated at the backup server. The wastage of available processing power multiplies when there are a large number of primary and backup servers in a data storage system.
Therefore, it is desirable to devise a data storage technique that allows a backup server to be used as a primary server for some other portion of a data storage system, thereby making use of that backup server's available processing power. It is further desirable to implement the backup server-based fault tolerance in a multi-server object based data storage environment.