The concept of “provenance” generally refers to the source or sources of a given item. In terms of “data provenance,” this generally refers to determining the source or sources of some given data. “Provenance data” is, therefore, data that is used to derive other data, or data that serves as a source of other data.
While data provenance has been used in decision support or data warehouse systems to uncover the interdependencies between data, minimal if any work has been done that considers provenance in the context of data streaming systems. Supporting data provenance in such systems creates some novel challenges as data volumes are orders of magnitude larger than in conventional systems and, therefore, the efficiency of provenance query evaluation quickly becomes an issue.