1. Technical Field
This application relates to the field of storing data, and more particularly to the field of data storage services in a scalable high capacity system where data is accessed between different types of systems.
2. Description of Related Art
It has been estimated that the amount of digital information created, captured, and replicated in 2006 was 161 exabytes or 161 billion gigabytes, which is about three million times the information in all the books ever written. It is predicted that between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes. The type of information responsible for this massive growth is rich digital media and unstructured business content. There is also an ongoing conversion from analog to digital formats—film to digital image capture, analog to digital voice, and analog to digital TV.
The rich digital media and unstructured business content have unique characteristics and storage requirements that are different than structured data types (e.g. database records), for which many of today's storage systems were specially designed. Many conventional storage systems are highly optimized to deliver high performance I/O for small chunks of data. Furthermore, these systems were designed to support gigabyte and terabyte sized information stores.
In contrast, rich digital media and unstructured business content have greater capacity requirements (petabyte versus gigabyte/terabyte sized systems), less predictable growth and access patterns, large file sizes, billions and billions of objects, high throughput requirements, single writer, multiple reader access patterns, and a need for multi-platform accessibility. Conventional storage systems have met these needs in part by using specialized hardware platforms to achieve required levels of performance and reliability. Unfortunately, the use of specialized hardware results in higher customer prices and may not support volume economics as the capacity demands grow large—a differentiating characteristic of rich digital media and unstructured business content.
In addition, legacy systems exist that access data using content addressable storage (CAS) where at least a portion of an identifier of a stored object is based on the content of the object. These systems may not be readily scalable. Accordingly, in some cases, CAS systems are being migrated to large distributed storage systems, such as the ATMOS storage system provided by EMC Corporation of Hopkinton, Mass. One difficulty associated with such a migration is that object IDs used by the CAS system may be different from object IDs used by the large distributed storage system. In such a case, an object ID translation table may be needed by a CAS emulator provided on the large scale storage system for clients that expect to store data using a CAS. However, such a translation table may not scale well, thus somewhat defeating the purpose of using a large distributed storage system, especially for new data objects created using the emulator.
Thus, it would be desirable to provide a storage system that addresses difficulties associated with emulating a CAS storage system on a large distributed storage system.