1. Field of the Invention
This invention relates to method and apparatus for securely archiving data at lower costs.
2. Related Art
With the rapid increase in the amount of digital data, the amount of fixed digital contents is also growing rapidly. Generally, fixed digital data refers to digital data that does not undergo changes or editing, such as email, x-ray, voice archive, and so on. Much of these fixed contents are required to be retained for a long term, safely, and low cost. For example, fixed data may be required to be preserved for regulatory compliance, litigation, etc. Therefore, such data needs to be stored securely and at low cost, but readily available for future retrieval.
Recently, content addressed storage (CAS) systems have been developed for archiving fixed data. As noted above, archived data must be kept unchanged for a long term. Therefore, to guarantee data immutability, certain CAS systems use content address mechanism. Content address is a unique identifier of archived data, derived from the data content itself. CAS calculates hash value of the archived data by using hash algorithm such as MD5, SHA1 or SHA 256. Other features of CAS include, for example, write prohibition, retention period management, customer metadata addition, data integrity check, etc. In addition, CAS makes at least one replica of archived data and stores these data into distributed storage systems to protect archive data for a long term. For more information about CAS system and addressing, the reader is directed to U.S. Pat. No. 7,096,342.
The CAS system is required to provide high accessibility to archived data, low cost retention, and data protection functionality. However it is difficult to reduce the retention cost by current CAS solutions, because they use expensive and high performance storage system such as hard drive array (HDD) to provide high accessibility. In addition, current solutions cannot eliminate the risk of data loss, because they provide data protection features only within the CAS system itself, irrespective of the operation of the storage system, e.g., the network attached storage (NAS), or other CAS systems coupled to it. Therefore, solution that can solve these problems is strongly required.