A data pipeline is a set of one or more coupled data pipeline subsystems (“subsystems”) that process and/or transform data extracted from data sources and for providing the processed data to data sinks. Data that passes through the data pipeline system may undergo multiple data transformation operations (“transformations”). A transformation can have dependencies on transformation(s) that precede it. The data involved in any transformation must remain meaningful for the downstream systems. For example, if elements of an upstream transaction data set like a credit card number or expiration date are to be obfuscated or replaced, then, if within one of the downstream application there is a feature that performs credit-card algorithm validation checks, that function must still be allowed to operate without error and operate as expected. According to conventional approaches for maintaining data pipelines, a system administrator configures and updates the subsystems so that the data involved in any transformation is meaningful for the application logic of the downstream subsystems. If a transformation is added, removed, or changed in an upstream subsystem, the system administrator must manually reconfigure the downstream subsystems to compensate for the change of transformation in the upstream subsystem. Conventional approaches for maintaining data pipelines therefore require significant human resources.
The headings provided herein are merely for convenience and do not necessarily affect the scope or meaning of the terms used.