Serial data processing (herein “query processing”) is often utilized to handle large amounts of data. The map-reduce framework has become a popular way of parallelizing large scale data queries and has found use in a wide variety of scenarios. HADOOP software is an implementation of the map-reduce framework that manages communication across various computing nodes and the HADOOP distributed file system (HDFS) is utilized in this context by HADOOP applications. In the HDFS, disk space is shared across computing nodes/machines in a cluster, with a data file being distributed among machines of the cluster. (HADOOP is a registered trademark of The Apache Software Foundation in the United States and/or other countries.)