This description relates to summarization in data lineage diagrams.
In data processing systems it is often desirable for certain types of users to have access to a visual representation of a lineage of data as it passes through the systems. Such “data lineage diagrams” can include graphical representations of data and entities in the system for processing that data and dependency relationships among them. Very generally, among a number of uses, such data lineage diagrams can be used to reduce risk, verify regulatory compliance obligations, streamline business processes, and safeguard data. It is important that data lineage diagrams are both correct and complete.
Some systems capable of generating and displaying data lineage diagrams are able to automatically present an end-to-end data lineage diagram showing representations of data items and the items representing processing entities that consume or generate those data items. A path upstream from a particular item is sometimes called a “dependency analysis” for that item, and a path downstream from a particular item is sometimes called an “impact analysis” for that item. As used herein, a “data lineage diagram” may include an upstream dependency analysis and/or a downstream impact analysis relative to any given item. Some systems capable of generating and displaying data lineage diagrams allow users to collapse logical and/or physical groups of items in a data lineage diagram into a single element. Some systems capable of generating and displaying data lineage diagrams are able to enhance data lineage diagrams with enriched data information such as data quality scoring.