1. Field of the Invention
The present invention relates generally to systems and methods for tracking provenance of data objects through workflows and more particularly to the application of such systems and methods to multiple, related workflows.
2. Description of the Related Art
Provenance has been widely acknowledged and discussed in the e-Science field. In this regard, articles have been written about provenance collection, modeling, representation, storing, and application.
The tasks, procedural steps, organizations or people, required input and output information, and tools needed for each step in a business process constitute a workflow. Depending on the nature of a particular enterprise, the workflow may be performed using local or distributed resources and may be performed using various software applications which may be referred to as workflow engines.
In certain workflows, the workflow engine itself includes functionality such that provenance information is automatically logged during the workflow execution. In this regard, such a workflow engine may include different levels: process level, data level, organization level and knowledge level. This type of system may further include semantic web technologies to link domain knowledge with the provenance information. The information so developed can be used for data quality verification, for example.
In another example, the workflow is defined in a proprietary data language and cataloged. A schema, or provenance model, can be queried by a user to review the provenance data for a particular workflow product.
In these provenance projects provenance data capturing schemes are generally tightly coupled with their workflow execution environment. Provenance information can be captured automatically during the workflow execution because of the existence of a workflow engine. However, when running workflows in an open, distributed environment, such an approach may not be practical. In this regard, one approach has been to wrap each workflow component as a web-service, and to define an open protocol among these web-services to capture provenance.
This approach, however, has not generally been applicable to provenance beyond a single workflow instance. It has not generally been able to integrate provenance across workflow instances or to capture an integration relationship between workflow instances.
To address the issue of multiple workflow instances, an approach has been applied to different instances of a common workflow that may have minor variations. In this approach, differences between instances of a single workflow are monitored and collected, but this approach has not been applied to different workflows.