A workload, e.g. MongoDB, operates in a cloud-based service system which has a cluster of nodes. The workload may run over a single node or multiple nodes in the cloud-based service system. Each of the nodes assigns at least one disk to store data for accessing. For working on a single node, when the assigned disk failed, the workload cannot be executed until backup data are restored. For working on multiple nodes, when one of the assigned disks failed, or even one node is out of order, performance of the cloud-based service system might be degraded since data need to be rebalanced to a new node. Performance of the workload is affected, too. It is obvious that healthy condition of disks in the cloud-based service system and well-planned archive for data restore are the key factors for data protection for workloads.
In fact, there are many techniques providing associated solutions to the requirement above. Most of these solutions are about prediction of lifespan of storage devices. For example, a traditional method for monitoring lifespan of storage devices may include the steps of: setting up a database which records a number of training data, wherein each of the training data includes operating behavior information and a corresponding operating life value; fetching operating behavior information from corresponding storage devices; building up a storage device lifespan prediction model according to the operating behavior information and corresponding operating life value of training data; and inputting the operating behavior information of the storage devices into the storage device lifespan prediction model to generate a predicted life value for individual storage device. The storage device lifespan prediction model can be rebuilt using predicted life value as well. When a first storage device in the storage devices is damaged, record a real lifespan of the first storage device and use it to rebuild the storage device lifespan prediction model.
Although there may be a way to predict lifespan for storage devices so that data protection can be carried out with the predicted results, it still encounters several challenges when applying. First, failure chance of one storage device (HDD or SSD) increases dramatically when the storage device is approaching the end of its lifecycle. However, the aforementioned method relies on training data of operating life value. A sudden failure of the storage device before the designed lifecycle is hard to avoid. Second, failure of the storage device is a result of workloads applied to. Namely, a higher usage the workload demands, a shorter lifespan a storage device has. Influence of workloads is not taken into consideration in previous methods. In addition, data protection should include a proper plan for back-ups of the data stored in the storage devices. If data back-ups are processed often, it may reduce the performance of related workloads. If not, systematic collapse of the workloads may happen. This problem should be settled if predicted lifespan of the storage devices is available.
Therefore, a method for data protection in a cloud-based service system is disclosed. The present invention is a solution to the problems mentioned above. Most important of all, the present invention introduces a concept of “near failure probability”. It considers the probability when one disk is close to its end of life. Thus, the present invention can provide a more precise prediction on the time a disk may fail and is an innovative method for or data protection in a cloud-based service system.