1. Field of the Invention
The present invention is in the field of data processing. In particular, the present invention provides a system and a method for processing a batch of data comprising a plurality of quanta of data. Such a system and method may be used as part of a text translation system.
2. Description of the Related Art
In the field of data processing, many batch processing systems are implemented using a queue structure. Typically, jobs or data to be processed is received by an entity and added to the queue structure to await processing. In a simple batch processing system a processing element would sequentially select a job or data from the queue structure for processing.
In batch processing systems with a large variety of jobs, it was found that a single processing element would generate a “bottleneck” within the batch processing system, wherein certain jobs would monopolise the processing resources of the system. In particular, the processing of large pieces of data would often slow down the system in a detrimental manner.
In the art, a compromise solution to this problem is to split the single processing element into multiple processing elements to create a parallel processing system. When implemented using a set amount of hardware or processing resources, for example a set number of central processing elements, a parallel processing system enables the distribution of the processing resources across the processing elements. In such a system, as each processing element now uses a proportion of the processing resources that were available to the single processing element, each of the plurality of processing elements typically takes a longer period of time than the single processing element to complete a given processing job. This detrimental effect is offset by the hope that jobs that monopolise processing power will only detrimentally affect one of the processing elements within the batch.
However, when implementing such a parallel processing system it has been found that such a system does not process data efficiently when there are a large number of jobs to be processed and/or where the resources required for each job vary widely between the received jobs. For example, such a system typically performs poorly when processing a large quantity of data files of a variety of sizes. Furthermore, even though such parallel processing systems may objectively reduce total processing time for a given number of jobs, these may not translate into a perceived (i.e. subjective) increase in processing efficiency, for example waiting time for processing may be high, even if total throughput is also high.
Hence, there is a requirement in the art for an improved system for processing jobs or data that can efficiently process a large number of jobs of different qualities.