1. Technical Field
This application relates to the field of storing data, and more particularly to the field of storing and retrieving data in a large data storage system.
2. Description of Related Art
Information that is added annually to the digital universe is estimated to be around 988 billion gigabytes, which is about eighteen million times the information in all the books ever written. The type of information that is stored includes rich digital media and unstructured business content. There is also an ongoing conversion from analog to digital formats—film to digital image capture, analog to digital voice, and analog to digital TV. The rich digital media and unstructured business content have unique characteristics and storage requirements that are different from structured data types (e.g. database records).
Many conventional storage systems are highly optimized to deliver high performance I/O for small chunks of data. Furthermore, these systems were designed to support gigabyte and terabyte sized information stores. However, rich digital media and unstructured business content have greater capacity requirements (petabyte versus gigabyte/terabyte sized systems), less predictable growth and access patterns, large file sizes, billions and billions of objects, high throughput requirements, single writer, multiple reader access patterns, and a need for multi-platform accessibility. In some cases, conventional storage systems have met these needs in part by using specialized hardware platforms to achieve required levels of performance and reliability. Unfortunately, the use of specialized hardware results in higher customer prices and may not support volume economics as the capacity demands grow large.
Some of these issues have been addressed using cloud storage, such as the cloud storage system provided by EMC Corporation of Hopkinton, Mass. Such a system is disclosed, for example, in U.S. patent application no. 20090112789 (the '789 application), which is incorporated herein by reference. The '789 application provides a system where data objects are distributed among different servers in different locations. Conventional hierarchal directory structures may be supported by having some of the data objects represent subdirectories that contain pointers to other data objects that represent either data files or represent additional subdirectories. Thus, for example, a data file corresponding to the file path specification “C:\ABC\DEF\GHI.doc” may be provided by a first object corresponding to the root volume, “C:\”, that points to an object that corresponds to the subdirectory “ABC”, that points to an object that corresponds to the subdirectory “DEF”, that points to an object that corresponds to the data file “GHI.doc”.
Accessing the object corresponding to “GHI.doc” includes beginning at the object corresponding to the root node and then traversing the objects corresponding to the subdirectory nodes. However, if one or more of the subdirectory objects becomes unavailable, then it may be difficult/impractical to find the object corresponding to “GHI.doc” even if that object is available. Accordingly, it is desirable to provide a system that efficiently locates data objects corresponding to files in a hierarchical directory structure in instances where objects corresponding to subdirectory nodes become unavailable.