MATLAB® is a product of The MathWorks, Inc. of Natick, Mass., which provides engineers, scientists, mathematicians, and educators across a diverse range of industries with an environment for technical computing applications. As a desktop application, MATLAB® allows scientists and engineers to interactively perform complex analysis and modeling in their familiar workstation environment. With many engineering and scientific problems requiring larger and more complex modeling, computations accordingly become more resource intensive and time-consuming. However, a single workstation can be limiting to the size of the problem that can be solved, because of the relationship of the computing power of the workstation to the computing power necessary to execute computing intensive iterative processing of complex problems in a reasonable amount of time.
For example, a simulation of a large complex aircraft model may take a reasonable amount of time to run with a single workstation with a specified set of parameters. However, the analysis of the problem may also require the model be computed multiple times with a different set of parameters, e.g., at one-hundred different altitude levels and fifty different aircraft weights, to understand the behavior of the model under varied conditions. This would require five-thousand computations of the model to analyze the problem as desired and the single workstation would take an unreasonable or undesirable amount of time to perform these computations. Therefore, it is desirable to perform a computation concurrently using multiple workstations when the computation becomes so large and complex that it cannot be completed in a reasonable amount of time on a single workstation.
In another example, an application can have a mathematical function that is to be integrated in parallel using a quadrature algorithm. In this case, the mathematical function must be evaluated a large number of times in order to calculate the integral to a sufficient degree of accuracy, and each evaluation of the mathematical function may take a large amount of time. To perform the integration in a reasonable amount of time, it would be desirable to have multiple workstations working on the integration in parallel, and communicating partial results with one another until a result with sufficient accuracy is reached.
Applications that are traditionally used as desktop applications, such as MATLAB®, need to be modified to be able to utilize the computing power of concurrent computing, such as parallel computing and distributed computing. Each machine or workstation needs to have its local copy of the application or at least the part of the application that has the necessary functionality for the machine or workstation to perform concurrent computing and the requested computations. Between the different instances of the application, there need to be a way to communicate and pass messages between the machines and workstations so that the multiple machines or workstations in the concurrent computing environment can collaborate with each other.
Message passing is a form of communication used in concurrent computing for different processes on the same or different machines/workstations to communicate with each other in the concurrent computing environment. Communication is made by the sending of messages from one machine/workstation to another machine/workstation. Forms of messages include function invocation, signals, and data packets. One example of a message passing method that establishes a communication channel between machines or workstations is Message Passing Interface (MPI).
When developing concurrent computing programs, such as parallel programs, especially in the “single program, multiple data” model, it is possible to introduce communication mismatches among the multiple nodes in a concurrent computing environment. Communication mismatch can be due to send/receive inconsistency caused by an error in program execution flow, such as a message was not sent because one of the processes exits a loop in an untimely manner. A mismatch can also be due to incorrect sender or receiver. A bug in the parallel program also can cause a communication mismatch. Some errors are non-deterministic, such as ones caused by differences in execution times caused by different data inputs. Errors can easily occur when there is a change in execution environment, such as a change in parallel platform. A communication mismatch in one part of an application may result in errors becoming apparent in a separate part of the application because the communication mismatch may leave some undeliverable messages in a pending state—when these messages are eventually received, they will not be what the receiver expects. A deadlock can possibly occur due to communication mismatch and causes the application to hang. As building many-core multi-processor systems and clusters becomes more popular, debugging a communication mismatch in a concurrent computing program becomes exponentially harder.