1. Technical Field
This application relates to the field of storing data, and more particularly to the field of data storage services in a scalable high capacity system.
2. Description of Related Art
It has been estimated that the amount of digital information created, captured, and replicated in 2006 was 161 exabytes or 161 billion gigabytes, which is about three million times the information in all the books ever written. It is predicted that between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes. The type of information responsible for this massive growth is rich digital media and unstructured business content. There is also an ongoing conversion from analog to digital formats—film to digital image capture, analog to digital voice, and analog to digital TV.
The rich digital media and unstructured business content have unique characteristics and storage requirements that are different than structured data types (e.g. database records), for which many of today's storage systems were specially designed. Many conventional storage systems are highly optimized to deliver high performance I/O for small chunks of data. Furthermore, these systems were designed to support gigabyte and terabyte sized information stores.
In contrast, rich digital media and unstructured business content have greater capacity requirements (petabyte versus gigabyte/terabyte sized systems), less predictable growth and access patterns, large file sizes, billions and billions of objects, high throughput requirements, single writer, multiple reader access patterns, and a need for multi-platform accessibility. Conventional storage systems have met these needs in part by using specialized hardware platforms to achieve required levels of performance and reliability. Unfortunately, the use of specialized hardware results in higher customer prices and may not support volume economics as the capacity demands grow large—a differentiating characteristic of rich digital media and unstructured business content.
Some of the cost issues have been addressed with tiered storage, which attempts to reduce the capital and operational costs associated with keeping all information on a single high-cost storage tier. However, tiered storage comes with a complex set of decisions surrounding technology, data durability, functionality and even storage vendor. Tiered storage solutions may introduce unrelated platforms, technologies, and software titles having non-zero operational costs and management requirements that become strained as the quantity of data increases.
In addition, tiered storage may cause a data replica incoherence which results in multiple, disjoint copies of information existing across the tiers of storage. For example, storage management software handling data backup and recovery may make multiple copies of information sets on each storage tier (e.g. snapshots, backup sets, etc). Information Life-cycle Management (ILM) software dealing with information migration from one tier to another may create additional and often overlapping copies of the data. Replication software may make an extra copy of the information set within a particular tier in order to increase performance to accessing applications. Each of these functions typically runs autonomously from one another. The software may be unable to realize and/or take advantage of the multiple replicas of the same information set.
In addition, for large scale unstructured information stores, it may be difficult to maintain a system and manage the environment as components fail. For example, a two petabyte information store may be comprised of eight thousand 250-gigabyte disk drives. Disk failures should be handled in a different manner in a system of this scale so that the system continues to operate relatively smoothly whenever one or only a few of the disk drives fail.
The problems set forth above are addressed in published U.S. patent application no. 20090112789 titled POLICY BASED FILE MANAGEMENT, which is assigned to the assignee of the present application and is incorporated by reference herein. The system described therein provides a multi-petabyte offering for building cloud storage that combines massive scalability with automated data placement to deliver content and information services anywhere in the world. The system operates as a single entity using metadata and business policy constructs to direct content to locations and users.
In some cases, it may be desirable to federate data from two or more clouds in a way that causes the data to appear to an end user as being from a single cloud. However, this may be difficult when attempting to join public and private clouds and/or joining clouds provided by different vendors that have different structures. In addition to any technical constraints, there may be security issues that need to be addressed when a private cloud containing sensitive data is combined with a public cloud.
Thus, it would be desirable to provide a system that facilitates joining different clouds and addresses security issues associated with joining public and private clouds.