The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
A data pipeline system is a series of jobs that each take data as input, apply business logic to the data, and output the results, typically to another job in the pipeline for further processing. A data pipeline system can be complex, requiring many interdependent jobs. Configuring a data pipeline system can be time-consuming, as it requires customizing each job in the data pipeline system. Such customization can require manual programming or implementation of each job in a programming language. However, oftentimes different data pipeline system deployments rely on a subset of similar jobs. For example, deduplication of data records can be implemented in one or more jobs. Deduplication is often needed across various data pipeline system deployments. Likewise, configuration of a machine learning system can be implemented in one or more jobs and is often needed across multiple data pipeline system deployments. What is needed is a way to easily configure a data pipeline system and reuse common jobs across data pipeline system deployments.
While each of the figures illustrates a particular embodiment for purposes of illustrating a clear example, other embodiments may omit, add to, reorder, and/or modify any of the elements shown in the figures.