Data lineage refers to a data life cycle that includes the data's origins and destinations and where it moves over time. Data lineage can also describe what happens to data as it goes through diverse processes. For example, data lineage for a particular document can include information indicating the locations where the document has been stored, where the document has been transmitted or received, and any alterations or changes to the document that may have occurred at each location.
Data lineage tracking and visualization is sometimes used in the field of business intelligence, which involves gathering data and building conclusions from that data. For example, data lineage tracking can be used to determine how sales information has been collected and identify what role it could play in new or improved processes within a business or organization. The data lineage can also be useful in designing improved processes.
Another use of data lineage is in safeguarding data and reducing risk. By collecting large amounts of data, businesses and organizations expose themselves to various legal, business, and/or security risks. For example, a security breach on a business server could result in the release of confidential or sensitive data, such as credit card numbers or personal information of users. Data lineage collection and analysis can be used to mitigate some of the risk by identifying the locations of various items of data at different points in time.