1. Technical Field
The present invention relates to tracking provenance information and, more particularly, to tracking provenance information in distributed systems by automatic embedding.
2. Description of the Related Art
Enterprises are increasingly interested in the area of data provenance, which involves tracking the lineage of data in a computing system. Understanding the pedigree of data is important when determining whether to trust that data, a process that is involved in many enterprise activities such as maintaining data retention compliance, audits of business processes, and tracking data security. Provenance assists in understanding how data evolves; provenance systems can keep information about how data is created, transformed, and replicated across different nodes in a distributed system.
In existing provenance systems, the provenance tracking capability is deliberately added as a data management system that runs in parallel with the system being observed. Access to the internals of the observed systems is needed to insert tracking code that is specifically configured for the system. In some cases, provenance tracking can be accomplished with less-invasive integration, e.g. tracking provenance gained from observing network traffic. However, this is limited in scope, such that more invasive approaches are needed to collect detailed provenance information that is potentially required, e.g. tracking the version history of a data item that is never sent over a network.
Furthermore, existing provenance tracking systems are typically applied to enterprise applications using enterprise storage, where access to the internals of the storage systems is available. However, there is a trend where enterprises are allowing their employees to use consumer devices and applications (sometimes called Bring-Your-Own-Device or BYOD). In this environment, it is useful to track provenance even if the user is using a non-enterprise application and storage provider. However, existing applications and devices do not support this type of provenance tracking, and there is no prospect for developers of such applications to introduce such features.