1. Field of the Invention
The present invention relates, in general, to data storage and sharing and recovery of data among networked data storage and other computing devices, and, more particularly, to a method of data storage and recovery involving fingerprinting data objects to reduce the amount of integration required for storage devices to allow them to securely share data objects and also to make data recovery more efficient.
2. Relevant Background
The amount and type of data storage is rapidly expanding, and data management is rapidly becoming a significant cost for many businesses or enterprises. Particularly, enterprise data is growing exponentially and today's businesses need a way to dramatically reduce costs associated with data storage and management. Enterprises also have to provide proper data back up to meet their needs such as servicing clients and complying with regulations and laws regarding maintaining data for relatively long periods of time. A complication for most businesses is the enterprise data may be highly dispersed over many machines, data storage centers, and interconnected networks/systems.
Data deduplication may be used to lower overall costs of physical data storage by storing only a single instance of unique data (e.g., only one copy of particular data such as a file or data object is stored) for an enterprise or group sharing access to data. Deduplication is fast becoming a standard feature in many data storage systems, but existing data deduplication techniques have a number of limitations including the use of database, in-memory index, or similar mechanism to store the information that is needed to retrieve a specific instance of unique data. Data deduplication generally is used to refer to the elimination of redundant data. In the deduplication process, duplicate data is deleted to leave only one copy or instance of the data to be stored.
A number of issues remain for the data storage industry with regard to data distributed among a number networked storage and computer devices. For example, the sharing of data among devices typically may require close integration of the various devices such that the data transmitted from other devices may be trusted (e.g., the networked devices are trusted devices), and such close integration can lead to many complexities in the data storage devices and/or data management software.
A further complication with data storage is the challenge of providing effective and inexpensive recovery of data such as disaster recovery. For example, it may be important for an enterprise to provide processes, policies, and procedures related to, recovery or continuation of the enterprise's technology infrastructure after a natural or human-induced disaster that may cause loss of all or portions of their data. Conventional disaster recovery would then involve the enterprise requesting all their data from single source that is providing a back up service for that enterprise often using tape storage devices. Unfortunately, performing a full back up or data restore from tape is a slow process that involves streaming of all the backed up data, which does not allow the enterprise system to verify what data is being received or its integrity. Also, use of tape and other conventional disaster recovery does not allow random selection or access of files or portions of data, which may be useful when only portions or subsets of the enterprise's data has been lost in the disaster or other event causing data loss.