This invention relates to the field of computer programs that generate computer code, and more specifically to a method and apparatus of generating code that better utilizes distributed-memory parallel computers to execute programs, particularly pre-existing programs that were written to run on sequential computers.
Over the years, scientists and engineers have built a large software base of numerical modeling codes for scientific computing. These legacy codes, are very valuable because of the solid pedigree associated with them. (Since they have been run and checked for accuracy many times by many different users, most of the errors have been found and removed.) The term xe2x80x9clegacy codexe2x80x9d as used herein refers generally to existing computer programs, and particularly to such programs (i.e., code) which were programmed some time ago. Typically, the programmers who worked on such code are no longer available, or have forgotten much of the reasoning used to generate the code. Legacy code is particularly valuable when it solves large and complex problems (for example, weather-forecasting models), and when much time and effort was spent to generate it. It is prohibitively expensive to recreate solutions from scratch to such problems in languages designed for distributed-memory parallel systems that are available today. In particular, it is very difficult and expensive to verify that such new programming exactly duplicates the operation and results of the legacy codes. The physics, mathematics, and numerical methods used together in large legacy code create a complexity which makes automatic parallelization difficult. Therefore, in the past, such code was manually converted by human programmers who examined the code, manually tracked the use of variables, and inserted changes to facilitate parallelization. Such manual conversion is tedious, time consuming (and thus expensive), and error prone (such manual processes could be compared to manually solving numerical problems that include very large numbers of manual calculations). The errors that occur in such manual conversion processes are particularly problematic since the errors propagate (and thus evidence of the error tends to be quite separated from the source of the error), and it is very difficult to track down the source of the error.
Difficulties in manual parallelization point to a need for automation. Several automatic and semi-automatic tools have been developed {J. J. Dongorra and B. Tourancheau, Environments and tools for parallel scientific computing, Advances in Parallel Computing, 6 (1993); [18]}. Doreen Cheng has published an extensive survey {Cheng, A survey of parallel programming languages and tools, Tech. Rep. RND-93-005, NASA Ames Research Center, Moffet Field, Calif. 94035, 1993} with 94 entries for parallel programming tools, out of which nine are identified as xe2x80x9cparallelization tools to assist in converting a sequential program to a parallel program.xe2x80x9d In spite of considerable efforts, attempts to develop fully automatic parallelization tools have not succeeded. Several years of research suggest that fill automation of the parallelization process is an intractable problem. Consequently, the emphasis of recent research has been on developing interactive tools requiring assistance from the user. Interactive D-editor {S. Hiranandani, K. Kennedy, C.-W. Tseng, and S. Warren, The d editor: A new interactive parallel programming tool, in Proceedings of Supercomputing Conference, 1994, pp. 733-7421} and Forge {R. Friedman, J. Levesque, and G. Wagenbreth, Fortran Parallelization Handbook, Applied Parallel Research, 1995} are examples of the state-of-the-art interactive parallelization tools. However, they too have a number of weaknesses and limitations when it comes to parallelizing legacy codes.
There is thus a need for an automatic method and apparatus that converts these legacy codes and other programs (such as those that were initially written to be run on a uniprocessor) into a form that allows the efficient use of modem distributed-memory parallel computers and/or networks of computers to run these computer programs.
The present invention provides a computer-implemented method and apparatus for parallelizing input computer-program code (the xe2x80x9cinput codexe2x80x9d) based on class-specific abstractions. The method includes the steps of providing a class-specific abstraction (CSA) of an underlying numerical method used in the input code, and generating parallelization code based on the CSA and the input code. Other aspects include checking the input code for compliance with the CSA, performing a dependency analysis of the input code for compliance with the CSA, analyzing the control flow of the input code based on the CSA, and generating a block-based representation of a control flow based on index variables in the input code and on the CSA. In one embodiment, the CSA includes a computational-set template, a dependency template, and a set of allowed index-variable access patterns. Yet other aspects include generating synchronization points based on the CSA, mapping a computational set to a virtual array of parallel processors, and mapping the virtual array of parallel processors to a physical array of parallel processors. Other features include outputting a representation of communications flow between processors of data related to index variables in the input code. Another embodiment includes the steps of identifying to the computer a numerical-method class used in the input code, and identifying a mapping of an index variable used in the input code to spatial coordinates. Other aspects include performing dependency analysis to determine communication-synchronization points, and minimizing the number of such points for data transmitted between processors.
Other aspects include a storage medium having a computer program stored thereon for causing a computer to parallelize input code by a method of the present invention.
Another embodiment of the present invention includes a computerized system for class-specific parallelizing of input computer-program code. The system includes a computer, receivers in the computer that receive input identifying to the computer a numerical-method class used in the input code and input identifying to the computer one or more index variables in the input code are associated with spatial coordinates of the numerical-method-class. The system also includes a synchronization-point generator in the computer that generates synchronization points for the input code based on the numerical-method class and the index variables, and a mapper in the computer that generates a global-to-local index variable mapping based on the numerical-method class and the index variables.
Yet another aspect of the present invention provides apparatus and a method that includes the steps of: identifying to the computer a numerical-method class used in the input code, identifying to the computer a mapping of a numerical-method-class space into a variable used in the input code, generating in the computer synchronization points for the input code based on the numerical-method class and the mapping, and generating in the computer a local array variable conversion based on the numerical-method class and the mapping. Other aspects of the present invention include generating in the computer a block-based representation of the control flow based on the index variable, performing in the computer dependency analysis to determine communication-synchronization points, and minimizing in the computer the number of communication-synchronization points for data transmitted between processors. Yet other aspects of the present invention include mapping in the computer an array space to a virtual array of parallel processors, and mapping in the computer the virtual array of parallel processors to a physical array of parallel processors.
Still other aspects of the present invention include selectively turning off one or more subroutine calls related to the index variable, and outputting a representation of communications flow between processors of data related to the index variable. The present invention also includes providing to the computer a computational-set template of allowed types of computations related to the index variable, and providing to the computer a dependency template of allowed types of dependencies related to the index variable.