1. Field
This disclosure relates generally to data processing using concurrent processes.
2. Background
The range of applications that require analyzing collections of documents is vast, and the available solutions are often specific to the type and the size of the application. An exemplary application may be to select emails that have certain keywords or expressions from a large email archive. Email archives may be analyzed for the presence of certain keywords or expressions for various purposes including as part of electronic discovery in litigations. The email archive of a corporation, for example, may include millions of emails, and may result in searching hundreds of millions of small documents (e.g., individual emails) for the presence of selected keywords and expressions.
Distributed and/or parallel processing are often used to speed up the performance of applications such as those mentioned above. For example, groups of these documents may be searched in parallel. In some conventional parallel systems, the programmer specifies how the processes are allocated to parallel processes. However, static specification of subtask assignment may not scale to large numbers. In some other conventional parallel systems, a master process may break up the jobs to sub-jobs and assign each sub-job to a process.