MATLAB® is a product of The MathWorks, Inc. of Natick, Mass., which provides engineers, scientists, mathematicians, and educators across a diverse range of industries with an environment for technical computing applications. MATLAB® is an intuitive high performance language and technical computing environment that provides mathematical and graphical tools for mathematical computation, data analysis, visualization and algorithm development. MATLAB® integrates numerical analysis, matrix computation, signal processing, and graphics in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation, without traditional programming. MATLAB® is used to solve complex engineering and scientific problems by developing mathematical models that simulate the problem. A model is prototyped, tested and analyzed by running the model under multiple boundary conditions, data parameters, or just a number of initial guesses. In MATLAB®, one can easily modify the model, plot a new variable or reformulate the problem in a rapid interactive fashion that is typically not feasible in a non-interpreted programming such as Fortran or C.
As a desktop application, MATLAB® allows scientists and engineers to interactively perform complex analysis and modeling in their familiar workstation environment. With many engineering and scientific problems requiring larger and more complex modeling, computations accordingly become more resource intensive and time-consuming. However, a single workstation can be limiting to the size of the problem that can be solved, because of the relationship of the computing power of the workstation to the computing power necessary to execute computing intensive iterative processing of complex problems in a reasonable time. For example, a simulation of a large complex aircraft model may take a reasonable time to run with a single computation with a specified set of parameters. However, the analysis of the problem may also require the model be computed multiple times with a different set of parameters, e.g., at one-hundred different altitude levels and fifty different aircraft weights, to understand the behavior of the model under varied conditions. This would require five-thousand computations to analyze the problem as desired and the single workstation would take an unreasonable or undesirable amount of time to perform these simulations. Therefore, it is desirable to perform a computation concurrently using multiple workstations when the computation becomes so large and complex that it cannot be completed in a reasonable amount of time on a single workstation.
Applications that are traditionally used as desktop applications, such as MATLAB®, need to be modified to be able to utilize the computing power of concurrent computing, such as parallel computing and distributed computing. Each machine or workstation needs to have its local copy of the application and between the different instances of the application, there needs to be a way to communicate and pass messages between the machines and workstations so that the multiple machines or workstations in the concurrent computing environment can collaborate with each other.
One example of a message passing method that establishes a communication channel between machines or workstations is Message Passing Interface (MPI). MPI is a standard for an interface for message passing that has been used between parallel machines or workstations in concurrent computing systems. In conventional concurrent computing systems, computing applications, which make use of MPI communications must be launched using a launcher program (usually called “mpirun” or “mpiexec”). An example of the syntax for calling mpirun is as follows.
mpirun−np <number of processes><application name and arguments>
Once an application has been launched using the above MPI method on a concurrent computing system and an error occurs, the default behavior is to abort all the parallel processes immediately and disconnect the communication channel established between the multiple machines and workstations. This behavior is not desirable as connections need to be re-established before concurrent computing can be utilized again.