Organizations that manage large amounts of data often wish to obtain data lineage for at least some of the data being managed. Data lineage for a set of data being managed may include information indicating how the set of data was obtained, how the set of data may change over time, and/or how the set of data may be used by one or more data processing systems and/or processes. Data lineage for a set of data may include upstream lineage information indicating how the set of data was obtained. For example, upstream lineage information may identify one or more data sources from which the set of data was obtained and/or one or more data processing operations that have been applied to the set of data. Additionally or alternatively, data lineage for a set of data may include downstream lineage information indicating one or more other datasets, processes, and/or applications that depend and/or use the set of data. An organization may wish to obtain lineage information for any suitable set of data such as, for example, one or more data records, one or more tables of data in a database, one or more spreadsheets of data, one or more files of data, a single data value, data used to produce one or more reports, data accessed by one or more application programs, and/or any other suitable set of data.
There are many uses of lineage information about the data managed by an organization's data processing systems. Examples of such uses include, but are not limited to, risk reduction, verification of regulatory compliance obligations, streamlining of business processes, safeguarding data, tracing errors back to their sources, and determining whether changes to data may lead to downstream errors. In some cases, incomplete or incorrect lineage information can lead to negative practical effects on the organization, such as records being handled incorrectly, inaccurate data being provided to members of the organization, inefficient system operation, system failures, inadvertent introduction of errors, inefficient resolution of errors, difficulty complying with regulatory processes, etc. For a business organization, such effects can quickly lead to customer and/or regulator dissatisfaction. Accordingly, it is important that lineage information is both correct and complete.