Technological advances over the past several decades have dramatically increased computer data file sizes. Advances in processor speed, memory, storage capacity, and other areas have allowed computing devices to continue to process and perform operations on these larger files. Despite these advances, some data files are large enough that it is either necessary or beneficial to distribute the data processing of one large file by dividing the file into smaller portions and processing each portion individually rather than attempting to process the entire file at one time on one computing device.
When a large data file is processed, the user who requests the processing often includes the computer code or application that is to be run on the data file. Because the application is unknown to the distributed data processing system to which the data and application are sent, the processing system is unable to estimate or determine the amount of computing resources to allocate to each portion of the large data file. To prevent a computing device from running out of computing resources as it processes multiple portions of the large data file, distributed data processing systems today allocate an entire computing device to each portion of the large file. In many instances, however, the resources of an entire computing device are not necessary to process a particular portion of the large file. In such cases, the computing device is used inefficiently, and the completion of the distributed processing is unnecessarily delayed by waiting for the computing device to process data portions one-at-a-time when sufficient computing resources are available to process data portions concurrently.