Query statements can be formed to obtain data from distributed storage and distributed processing resources. The distributed storage may be a distributed database or a distributed file system. Apache Hadoop® is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
In order to implement distributed computing, there are typically three main functional components: the client component, the management component and the worker component. The client component accepts input from the end user and converts it into requests that are sent to the management component. The management component splits a request into multiple sub-requests and sends the sub-requests to the worker components for processing in parallel. The worker components each process the request by operating on data. Typically, the management component serves as the first point of response serving many requests by many clients simultaneously and coordinating multiple worker components for any given request. In some cases, there may be another machine delegated to receive requests from the client application on behalf of the management component.
This three-component paradigm is applicable across all major components of Hadoop including the storage layer (e.g., Hadoop Distributed Files System (HDFS)), the compute layer (i.e., MapReduce®) and the access layer (e.g., Hive®, Solr®, Spark®, Storm®, etc.), where there is a server component that responds to the requests from various clients.
One challenge working with an open source platform, such as Hadoop, is that the code base continuously changes due to contributions from the contributor community. Consequently, it is difficult to augment system functionality in a manner that is guaranteed to work with the continuously changing code base. Accordingly, it would be desirable to develop techniques for producing statement modifications that are certain to be operative with a continuously changing code base.