Over the years, scientists and engineers have built a large software base of numerical modeling codes for scientific computing. These legacy codes, are very valuable because of the solid pedigree associated with them. (Since they have been run and checked for accuracy many times by many different users, most of the errors have been found and removed.) The term "legacy code" as used herein refers generally to existing computer programs, and particularly to such programs (i.e., code) which were programmed some time ago. Typically, the programmers who worked on such code are no longer available, or have forgotten much of the reasoning used to generate the code. Legacy code is particularly valuable when it solves large and complex problems (for example, weather-forecasting models), and when much time and effort was spent to generate it. It is prohibitively expensive to recreate solutions from scratch to such problems in languages designed for distributed-memory parallel systems that are available today. In particular, it is very difficult and expensive to verify that such new programming exactly duplicates the operation and results of the legacy codes. The physics, mathematics, and numerical methods used together in large legacy code create a complexity which makes automatic parallelization difficult. Therefore, in the past, such code was manually converted by human programmers who examined the code, manually tracked the use of variables, and inserted changes to facilitate parallelization. Such manual conversion is tedious, time consuming (and thus expensive), and error prone (such manual processes could be compared to manually solving numerical problems that include very large numbers of manual calculations). The errors that occur in such manual conversion processes are particularly problematic since the errors propagate (and thus evidence of the error tends to be quite separated from the source of the error), and it is very difficult to track down the source of the error.
Difficulties in manual parallelization point to a need for automation. Several automatic and semi-automatic tools have been developed {J. J. Dongorra and B. Tourancheau, Environments and tools for parallel scientific computing, Advances in Parallel Computing, 6 (1993); [18]}. Doreen Cheng has published an extensive survey {Cheng, A survey ofparallel programming languages and tools, Tech. Rep. RND-93-005, NASA Ames Research Center, Moffet Field, Calif. 94035, 1993} with 94 entries for parallel programming tools, out of which nine are identified as "parallelization tools to assist in converting a sequential program to a parallel program." In spite of considerable efforts, attempts to develop fully automatic parallelization tools have not succeeded. Several years of research suggest that full automation of the parallelization process is an intractable problem. Consequently, the emphasis of recent research has been on developing interactive tools requiring assistance from the user. Interactive D-editor {S. Hiranandani, K. Kennedy, C.-W. Tseng, and S. Warren, The d editor: A new interactive parallel programming tool, in Proceedings of Supercomputing Conference, 1994, pp. 733-742} and Forge {R. Friedman, J. Levesque, and G. Wagenbreth, Fortran Parallelization Handbook, Applied Parallel Research, 1995} are examples of the state-of-the-art interactive parallelization tools. However, they too have a number of weaknesses and limitations when it comes to parallelizing legacy codes.
There is thus a need for an automatic method and apparatus that converts these legacy codes and other programs (such as those that were initially written to be run on a uniprocessor) into a form that allows the efficient use of modem distributed-memory parallel computers and/or networks of computers to run these computer programs.