Traditionally, as data moves from one system to the next system in files, streams or APIs, names of elements change, new elements are created, and new file formats are utilized.
Existing data lineage solutions are not fine-grained. Existing solutions require manual documentation to be entered by end users. Existing solutions that document data lineage require end users to enter from and to fields, transforms, and write up transform logic. Sometimes, the manual documentation is never updated when code changes, resulting in data lineage records not matching what is running in production.
Existing solutions log every transformation, resulting in an overwhelming number of logs to be reviewed. End users are not able to review every log with sophistication. When data involved in a transformation process such as input data or output data of the transformation process is incorrect, the existing solutions are not able to determine what causes the problem.
Sometimes, each existing data lineage solution is very specific and limited to one technical implementation of a proprietary software.
In view of the foregoing, a need exists for a data lineage solution that works with multiple environments, provides more fine-grained control to ensure accuracy of data transformation, and performs rule-based, trigger-initiated logging so as to avoid logging of every data transformation.