Technical Field
This application relates generally to data storage.
Background of the Related Art
In data centers across the world data is growing at an alarming rate. With digitization of content the paperwork of the world is turning into data bits that must be saved, protected and managed. For example, businesses that once had thick physical files and cabinets full of paper now have terabytes of data increasing at a 50% compound annual growth rate (CAGR). What was once a single MRI image is now 5 gigabytes of data for a medical firm to store and protect. The explosive growth in data is felt at all levels from the consumers to the large enterprise. There are different types of data and the invention focuses specifically on the growth of unstructured files, considered to be about 60% of the overall data, as opposed to structured data such as that found in databases, block storage devices and the like.
Unstructured file data is typically stored in local file systems or on network attached file systems (NAS). NAS devices can be built from commercially or freely available software (for example, Windows Server 2003 and OpenFiler). NAS devices also can be provided in physical or virtual (i.e. a VMWare image) forms. NAS devices have flexibility in connecting to directly-attached and storage area network (SAN) attached storage to provide for their storage needs.
The storage industry also has the introduction and growth of storage service providers (SSPs). In recent years, scalable distributed storage devices using commodity hardware have been created by a number of companies. These systems provide a number of basic and advanced attributes including capacity scalability, self healing, performance scaling, duplicate elimination, simple interfaces, etc. Some of these systems were designed and intended for large enterprises to store their fixed-content (archive) information internally, but some of these systems are being connected to the Internet to provide generic storage services. For example, Amazon's S3 service is the leading service of this nature and is being used by many Web 2.0 companies to store and scale their data capacity needs without having to provide their own storage. Storage service providers are essentially utility companies for storage and bill their customers based on the amount of data stored within their service. Amazon's S3 service has been growing rapidly showing the demand for storage provided as a service.
It is also known in the prior art to provide backup services that replicate data to services provided by the Internet. These services use software installed on a client to send data to an Internet service in a proprietary format. These are special purpose SSPs. In addition to these backup offerings, some companies are now providing generic unstructured file services to allow data to be copied to the SSP. These services either provide direct access to the SSP or synchronize files to the SSP. Each one supports a single target SSP and is generally provided as a software application or software service within the computer operating system. Often, both these types of remote storage include provisions for versioning (keeping older copies) of the data and a method to access the data online as well as locally.