The amount of data generated has been growing exponentially. As computer technologies are increasingly adopted and improved in various fields, there will be vast amounts of data generated by various systems. Generated data also moves between several systems and can go through various processes. Such movement of data can lead to loss of data integrity and validity. Data lineage techniques aim at addressing this problem by tracing flow of data across various systems. By tracing data flow, validity and integrity of data can be ensured, outdated copies of data can be identified and decommissioned, complex data flows can be reengineered to improve data quality, and regulatory compliance audits can be facilitated.
Currently available data lineage techniques can be considered primitive as they only produce a visual representation of data flow. There is no way to easily trace data flows from one system to another.