Various web applications can generate immense amounts of semi-structured hierarchical data, with JSON (JavaScript Object Notation) and XML (Extensible Markup Language) being typical default data models to exchange information across the web. The data is often processed later for various business analytics purposes. The map-reduce framework has been developed to provide a tool for processing massive amounts of data. Map-reduce is a parallel-processing, shared-nothing architecture (i.e., distributed computing architecture with each node essentially being independent and self-sufficient), with data-shuffling. (Background information on map-reduce architecture, which may be employed by way of better understanding a context of at least one embodiment of the invention, may be found in Dean, J., and Ghemawat, S., “MapReduce: Simplified Data Processing on Large Clusters”, Google, Inc., OSDI 2004 (6th Symposium on Operating System Design and Implementation, San Francisco, Calif.); available via http://labs.google.com/papers/mapreduce-osdiO4.pdf.
However, it has been found that processing hierarchical data in a map-reduce framework can be difficult and present obstacles to efficient operation. Particularly, it needs to be ensured that full parallelism and independent functioning among maps (a processing unit with a block of input data) is in place for producing key-value pairs to be processed by different map instances. If input data is such that input to the map function (a key-value pair) can be generated independently, then many difficulties can be averted. Without an effective arrangement to process hierarchical input data, conventional map-reduce frameworks can become highly unsatisfactory in their performance.