A fixed-content object is a container of digital information that, once created, remains fixed. Examples of objects that could be fixed include medical images, PDF documents, photographs, document images, static documents, financial records, e-mail, audio, and video. Altering a fixed-content object results in the creation of a new fixed-content object. A fixed-content object once stored becomes immutable.
Fixed-content objects are often subject to regulatory requirements for availability, confidentiality, integrity, and retention over periods of many years. As such, fixed-content data stores grow without bounds and storage of these digital assets over long periods of time presents significant logistical and economic challenges. The long retention times result in both large data volumes and enormous numbers of objects. In many applications, access to fixed-content data occurs in a multi-facility environment linked together over limited bandwidth wide area networks. In these environments, network, hardware, or software failures should not prevent access to the fixed-content data.
To address the economic and logistical challenges associated with storing an ever growing volume of information for long periods of time, fixed-content storage systems implement a multi-tier storage hierarchy and apply Information Lifecycle Management (ILM) policies that determine the number of copies of each object, the location of each object, the retention time for each object, and the storage tier for each object. These policies will vary based on the content of each object, age of each object, and the relevance of the object to the business processes.
A multi-site, multi-tier storage system, large scale distributed fixed-content storage is needed, for example, to address the requirement for storing multiple billions of fixed-content data objects. These systems ensure the integrity, availability, and authenticity of stored objects while ensuring the enforcement of Information Lifecycle Management and regulatory policies. Examples of regulatory policies include retention times and version control.
In general, large-scale distributed systems are comprised of components (nodes) that may be inherently unreliable. Thus, as the distributed system grows in capacity and the geographic distribution of the nodes increases, the probability that all nodes are both reachable and operational decreases rapidly. The probability of all nodes being functional (e.g., reachable and operational) can be expressed by taking the probability that an individual node is functional and raising it to the power of the number of nodes. Thus, as the number of nodes increases, the probability that all nodes are functional becomes very small. When considering data integrity, it should be assumed that some nodes may be either non-operational or inaccessible.