The present invention relates generally to a backup and archival policy method, and more particularly, but not by way of limitation, to a system, method, and recording medium for analyzing data under a backup system and subsequently performing correlation analytics on data classification results and a backup policy to transform data protection for an enterprise.
Ninety-percent of the data in the world was created in the last two years, and data volumes are rising faster than storage price is declining. Therefore, cheap storage is no longer the only answer to controlling the costs associated with data growth and backup policies. Data classification analytics can play an important role in discovering, recognizing, and subsequently acting on data in-place to transform a modem day enterprise to be data-driven by identifying relevant data. Conventional data classification processes help with finding the data that matters eventually leading to the outcome of getting rid of old, obsolete data and identifying sensitive content.
However, data classification analysis has been conventionally limited to the domain of operational data—from the domain of active file systems, active applications, such as E-mail, document management systems, content management systems, etc. Significant amount of similar irrelevant data gets accumulated in a data protection system historically based on the backup and retention policy of the data protection system (it is noted that “backup” and “data protection” are used interchangeably and mean substantially the same in the context of this application).
Conventionally, backup of data or data protection is done because the user wants to protect the enterprise from physical or online data corruption. As an enterprise, a backup policy is specified such as a daily or weekly backup policy. The policy scans all the files to see if a file was updated. If the file was updated, the policy creates a backup of the file and if the file was not updated, a backup is not created for the file.