The performance of analyses of large data sets (e.g., what is commonly referred to as “big data”) is becoming increasingly commonplace in such areas as simulations, process monitoring, decision making, behavioral modeling and making predictions. Such analysis are often performed by grids of varying quantities of available node devices, while the data sets are often stored within a separate set of storage devices. This begets the challenge of enabling efficiently generating indexes for such large data sets to enable efficient searching of such large data sets across multiple node devices of a grid to enable specific pieces of data to be efficiently located and retrieved.