1. Technical Field
The present invention relates generally to techniques for highly available, reliable, and persistent data storage in a distributed computer network.
2. Description of the Related Art
A need has developed for the archival storage of “fixed content” in a highly available, reliable and persistent manner that replaces or supplements traditional tape and optical storage solutions. The term “fixed content” typically refers to any type of digital information that is expected to be retained without change for reference or other purposes. Examples of such fixed content include, among many others, e-mail, documents, diagnostic images, check images, voice recordings, film and video, and the like. The traditional Redundant Array of Independent Nodes (RAIN) storage approach has emerged as the architecture of choice for creating large online archives for the storage of such fixed content information assets. By allowing nodes to join and exit from a cluster as needed, RAIN architectures insulate a storage cluster from the failure of one or more nodes. By replicating data on multiple nodes, RAIN-type archives can automatically compensate for node failure or removal. Typically, RAIN systems are largely delivered as hardware appliances designed from identical components within a closed system.
Prior art archival storage systems typically store metadata for each file as well as its content. Metadata is a component of data that describes the data. Metadata typically describes the content, quality, condition, and other characteristics of the actual data being stored in the system. In the context of distributed storage, metadata about a file includes, for example, the name of the file, where pieces of the file are stored, the file's creation date, retention data, and the like. While reliable file storage is necessary to achieve storage system reliability and availability of files, the integrity of metadata also is an important part of the system. In the prior art, however, it has not been possible to distribute metadata across a distributed system of potentially unreliable nodes. The present invention addresses this need in the art.