The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Many large-scale data analytics systems are designed to efficiently run large-scale data transformation jobs. Such large-scale data analytics systems apply transformations to one or more input datasets to generate one or more output datasets. Such data analytics systems include multiple subsystems that are tightly coupled, making it difficult to add new transformation features, as changes to support such features need to be made across many subsystems. For example, expanding existing workflows to support a new coding language will require making changes to many tightly coupled subsystems. Furthermore, transformations may operate on data that is subject to security restrictions. Developers may wish to author transformations in any of several programming languages. Auditors may wish to track the origin of columns in transformation output and yet with current systems it is difficult to know which transformation or elements thereof contributed to particular output columns in an output dataset. Thus, there is a need for a distributed data analytics system that allows for efficient development and deployment of new features to one or more subsystems without having to make changes to all subsystems.
While each of the figures illustrates a particular embodiment for purposes of illustrating a clear example, other embodiments may omit, add to, reorder, and/or modify any of the elements shown in the figures.