Parallel processing clusters of individual computing devices provide a framework for high-speed distributed processing of immense datasets. However, the amount of data in many datasets can easily reach and exceed multiple terabytes (TBs) and the dataset may have significant complexity in terms of file type, delimiters, data field length and so on. The size and complexity of datasets presents a significant technical challenge to executing important processing tasks on a parallel processing cluster. Improvements in making large, complex datasets readily available to the many devices in the cluster will enhance the ability of parallel processing clusters to execute complex processing tasks.