Data has proven to be an important asset of enterprises, and the rapid growth of data has made enterprises facing unprecedented challenges. Meanwhile, the cost pressure brought by the rapidly changing world economic situation and fierce competition enables enterprises to have to consider how to reduce IT costs and meet the growing storage needs of enterprises.
The existing storage architecture can be classified into two types: one is a proprietary architecture for one party, such as the DAS (Direct Attached Storage), SAN (Storage Area Network, Storage Area Network) and NAS (Network Access Server,). Such storage systems are exclusively used by one party and can provide users with very good control, better reliability and performance, but due to their poor scalability, they do not apply to large-scale deployment; it is quite difficult for users in this mode to flexibly use storage budgets, and a one-time investment is needed to buy storage equipment; along with the increase in storage capacity, the cost control will also face challenges.
The other is a multi-party sharing architecture, that is, cloud storage architecture. According to their different service scopes, they are classified into private cloud and public cloud. The architecture of cloud storage based on network technologies (internet and intranet) provides users with on-demand purchasing and leasing of storage space, and on-demand configuration service; namely, usually, a third party or third-party department in enterprises provides storage apparatus and specialized maintenance personnel. Through the storage service, enterprises or various departments within the enterprises can significantly reduce their internal storage requirements and corresponding administrative costs, to balance the sharply rising storage requirements and business cost pressure. The users who adopt the storage service can be individuals, enterprises, or even departments within the enterprises or branch offices.
In addition to the difference in service target and scope, one key differentiator of public cloud storage system against private cloud storage system and enterprise traditional network backup system is that public cloud storage service providers (data center) are mutually independent. Restricted by concern on data security and compatibility, there is little need for service providers to exchange information and communicate with each other.
The public cloud storage service can be used to cut the storage cost of enterprises and individuals, and increase the flexibility to address data storage requirement, but in reality, there are always various unpredictable causes to make the cloud storage service unavailable, and even make the data in the data center of the cloud storage provider lost or illegally modified. This concern hinders enterprises and individuals from using public cloud storage service, especially when critical business data storing is involved. Typically, the cloud storage service provider goes bankrupt or out of service for other reasons, and which further leads to the risk of data loss. Or due to the service price of cloud storage, enterprises choose smaller providers of cloud storage service, who then goes bankrupt due to poor management. Or because of some irresistible events, such as earthquake or flood, the data in the data center of the cloud storage service provider is lost. Or for some reasons, such as power failure, the data in the data center becomes unavailable temporarily (usually cloud storage service providers promise that their service is 99.99% or 99.999% available). Or due to merger and acquisition or other reasons, human error causes the loss of data in data center. Or virus or hacker attack causes the data in the cloud storage data center lost or illegally modified.
Given the importance of data to enterprises or legal requirements, it is necessary to strengthen fault tolerance of public cloud storage system, especially for corporate or personal critical data.
Traditional method used to improve the availability and fault tolerance of cloud storage data service usually depends on a SLA signed between enterprises and cloud storage service provider, and the RAID mechanism (mainly RAID 2-RAID 6). For references, see David A. Patterson, Garth Gibson, and Randy H. Katz: A Case for Redundant Arrays of Inexpensive Disks (RAID), ACM 1988. RAID 2-RAID 6 system enables to store the parity value of data (already transmitted to storage media) to keep a sufficient redundancy for data and therefore ensure that the disaster at storage hardware level not lead to data loss.
RAID mechanism is usually used for data protection in the data receiving end in network storage. By creating the data redundancy with parity value at the data receiving end it can prevent data loss from storage hardware damage; RAID is only limited to use for creating data redundancy for interconnected storage media within a data center. In summary, RAID can only be used to protect data loss from the damage at storage hardware level, like hard disk and tape, etc.
However, in order to address damage at non-storage hardware level, which may cause cloud storage service temporarily unavailable and user data being not retrievable as expected, the risk of user data being lost or illegally modified, it is necessary to create a new method to enhance availability and fault tolerance of public cloud storage data.