Many large-scale data analytic systems are designed to efficiently run large-scale data processing jobs. For example, a traditional large-scale data analytic system is configured to execute large-scale data processing jobs on a cluster of commodity computing hardware. Such systems can typically execute job tasks in parallel at cluster nodes at or near where the data is stored, and aggregate and store intermediate and final results of task execution in a way that minimizes data movement between nodes, which would be expensive operationally given the large amount of data that is processed. Such systems also typically store data and job results in distributed file system locations specified by users but do not provide extensive revision control management of data and job results.
Accordingly, the functionality of traditional large-scale data analytic systems is limited at least with respect to revision control of the data that is processed. Thus, there is a need for systems and methods that provide more or better revision control for data processed in large-scale data analytic systems. Such systems and methods may compliment or replace existing systems and methods for data revision control in large-scale data analytic systems.