Parallel processing provides economic, scalable, and high-availability approaches to computing solutions. From the point of view of using a computationally expensive non-threaded (e.g., single-threaded) legacy software program, there is a need for achieving the operational performance associated with parallel-processing techniques without substantially rewriting the legacy software program. In this way, the operational latency associated with the execution of the software program may be substantially reduced.
Parallel processing includes solutions that are largely hardware in nature such as, and by way of example only, massive parallel processing (MPP). Furthermore, parallel processing may be achieved in a multiple processing environment (MPE) using largely software techniques such as, and by way of example only, symmetric multiple processing (SMP) environments where a software job scheduler directs application programs to run on a particular processing element (PE). In SMP environments, the operating system (OS) is enabled to treat a group of PEs as peers.
Conventional parallel processing systems fall generally into four categories: single instruction stream operating on multiple data streams (SIMD), same program operating on multiple data streams (SPMD), multiple instruction streams operating on multiple data streams (MIMD), and multiple processors sharing a single data or memory (MISD). Moreover, a system that is usually not classified as a parallel processing system is often referred to as a single processor operating on single data (SISD).
Additionally, software programs may be written such that they are threaded, or capable of having their internal instructions concurrently (e.g., in parallel) executed multiple times by multiple processes. As will be apparent to those skilled in the art, developing a threaded program is an expensive development exercise, wherein the logic is largely event driven, rather than linear. Moreover, expensive error-handling applications must be developed for recovery purposes in the event of an unexpected failure during any particular execution sequence. As a result, very few legacy programs have been written in a threaded manner. For the most part, OS kernel programs are threaded as well as large-scale distributed database programs. A program that is not threaded is often referred to as being singly threaded or non-threaded.
In recent years, industry consortiums have promulgated industry standards that have increased the potential for achieving parallel processing results by creating a message-passing interface (MPI). By and large, MPI is primarily used to interface disparate software programs. In MPI, both the sending program and the receiving program must be capable of translating the MPI data. MPI is not a software program and legacy programs or additional programs must be developed to effectively process MPI data.
With MPI, input and output data passed among multiple programs are standardized, such that disparate computing environments or programs may be interfaced to one another. Yet, each interfaced program must be capable of translating the MPI data, therefore program customization is still required. Furthermore, MPI has not been embraced, or interfaced to, standard job scheduling programs that reside in MPEs. These job schedulers provide efficient load balancing to the PEs and, in some instances, individual programs residing on a single PE.
However, even with MPP, SMP, SIMD, SPMD, MIMD, MISD, and MPI many computationally expensive legacy programs are still unable to improve throughput of operation, because of the extreme development expense associated with rewriting the legacy programs to be threaded. For example, consider bioinformatic programs designed to compare protein or DNA sequences and query multiple databases for matches to multiple protein or DNA sequences. These programs continue to experience large operational latency, and none of the present technologies have provided cost-effective and timely solutions.
Accordingly, there is a need to transparently permit computationally expensive legacy programs to execute in parallel in order to improve operational efficiency, and there is a need to minimize the development expense associated with achieving the same. Therefore, there is a need for improved apparatus, methods, data structures, and systems which transparently parallel process the operations associated with a computationally expensive non-threaded software program.