Data processes are commonly observed in a number of applications. In a typical scenario, a sender saves a certain data into a file in a certain format, and then sends the file to a recipient. Upon receiving the file, the recipient analyzes content within the file, and performs logical processing accordingly.
In the above data process, if the file is not too big, and the recipient does not have a strict processing time requirement, a single server or a single thread may be used for processing. Under that circumstance, corresponding system may still operate normally, though the time taken by the recipient to process the data of these files may be quite long. However, if the file is very big (and/or the number of files is large), and the recipient has a very strict processing time requirement (e.g., the recipient may require the data of the file transmitted from the sender to be completely processed within one minute or even a shorter period of time), the processing system using single server or single thread may not be able to satisfy this need.
In many instances, file data is transferred from a sender to a recipient on a regular basis, once every five minutes, for example. In addition, the recipient may have a maximum delay tolerance for the data. If the recipient cannot complete processing the transmitted data during a corresponding interval, vicious cycle may result—unfinished processing of data from previous period and arrival of new data will increase the data delay of the recipient and eventually lead to a system collapse.
Requirement for processing such large volume of data is normally seen in a number of large-scale applications. Examples include reporting students' data from a school to an education authority in educational sector, web log processing in large-scale websites, and inter-system data synchronization, etc. Therefore, a method for processing large volume of data within a scheduled time is required to alleviate data processing delay.