In the era of big data and artificial intelligence (AI), intelligent use of data has become an important factor in the success of many businesses. Data often forms a foundation for advanced analytics, AI, and business operation efficiency. As more businesses become data-driven and data volume grows rapidly, there is an increasing need to manage and execute complicated data processing pipelines that extract data from various sources, transform it for consumption (e.g., extracting features and training AI models), and storing it for subsequent uses. Workflow engines are often used to manage data workflow pipelines at scale.
Despite the benefits of workflow engines, full utilization of workflow engines remains burdensome, due to steep learning curves and the effort needed to author complicated workflow pipelines. A directed acyclic graph (DAG) defines a workflow pipeline for exploiting data to produce some desired results. Typically, users (e.g., engineers and scientists) interact with a graphical user interface (GUI) to manually compose DAGs, or must learn a special syntax to generate DAGs programmatically. Neither of these approaches is natural or intuitive, and both are prone to error. Generating, adapting, and reviewing the generation of DAGs can introduce significant overhead effort when DAGs become large and complex.