Entities typically generate and use data that is important in some way to their operations. This data can include, for example, business data, financial data, and personnel data. Accordingly, entities create and store backups of their important data that can later be used in a data restore process if necessary. Such backups are often stored in a cloud storage environment. The use of cloud storage provides some convenience and advantages in terms of cost for example, but also introduces various problems.
The creation and storage of backups is typically performed by one or more data protection entities of a data protection environment. These backup processes typically impose significant overhead costs on the data protection entities in terms of their memory, storage and CPU processes, for example. This is particularly so where backups are performed relatively frequently and/or the backup datasets are relatively large. As well, backup processes performed in connection with the data protection entities may also impose a significant load on network/bandwidth resources. This is often a concern where an enterprise must transmit its backup offsite, such as to a cloud storage platform.
A related concern with cloud based storage is that some enterprises primarily employ virtual machines (VM) to perform data protection, rather than using a purpose built backup appliance (PBBA). The VMs are somewhat disadvantageous relative to a PBBA in that parameters of the VMs are relatively harder to control, and lack flexibility. For example, the Amazon Elastic Cloud Compute (EC2) environment permits only limited configurability in terms of the data protection VMs that can be employed by an enterprise. For example, CPU, memory, and network resources cannot readily be added to that VM by the user. Consequently, these VMs are limited in terms of their functionality and capability. Some users have attempted to address this problem by using VMs of more significant capability, however, this approach results in cost increases to the user, as well as underutilized capacity and capabilities.
The use of cloud storage resources also presents concerns with respect to data integrity. Thus, enterprises have a need to employ data integrity checks in connection with their backup data. However, performance of such data integrity checks may place significant demands on computing resources such as CPU, IOPs, memory, and network resources. As noted above, existing computing systems and environments are not well suited to take on the workload imposed by data integrity checks without significant impact to system performance.
In more detail, the performance of data integrity checks can impose costs on enterprise data protection systems and/or on cloud storage resources in a variety of areas relating to data protection system performance. For example, performing data integrity checks may result in increased CPU cycles, a need for more and/or faster memory, and an increased need for input/output operations per second (IOPs) capability and network bandwidth.
In view of circumstances such as those just noted, a consequent technological problem is that current data protection environments and associated entities are not well suited to take on additional functionalities, such as data integrity checks for example. This is a matter of concern, particularly where important functionalities such as data integrity checks are desired to be implemented. Moreover, while important functionalities may be implemented, doing so can cause a significant reduction in the performance of the data protection entities and/or the data protection environment. Thus, there is a disincentive to impose additional workloads on the data protection entities and data protection environment.