1. Field of the Invention
The present invention is a computer-based system and method for optimizing parsed statements to be used for generating low level code, and for generating that: low level code for a target computer having a scalar and a parallel portion.
2. Related Art
Over the past several years, society has placed an increasingly larger demand on the speed at which it expects computers to operate. This is due in large part to the desire to solve larger and more complex problems. This has driven computer manufacturers to design faster, more complex computers to keep up with these demands.
Most computers in the marketplace today have been designed using the traditional von Neumann architecture. This type of architecture basically contemplates using a single central processing unit (CPU) to execute the instructions of a computer program. Many computer companies desiring to create faster machines have concentrated on developing faster CPUs and continuing to design devices using von Neumann architecture.
In recent years, much interest has developed in the area of parallel processing. This concept contemplates using possibly tens of thousands of processors simultaneously to solve a specific problem. Using this scheme, the specific problem is divided up into tasks, and each of these tasks is sent to a specific processor to be executed. The result from each of these processors is then brought together again, yielding a final result. Of course, this is more complicated than it at first sounds, and there have been many problems in developing efficient and robust parallel processing devices.
In a typical computer program (and specific problem being solved) there are pieces that can be efficiently divided up into parallel tasks to be handled by a plurality of processors, and there are pieces where such division would be inefficient or highly impractical. Consequently, many parallel processing devices (referred to hereafter as a target computer) contain a scalar portion for executing scalar code (that is, that portion of the program which is best executed on a single processor device). Another portion of the target computer then contains multiple processors which can be used simultaneously to solve different portions of a particular problem. These two portions can be two different machines interconnected together, or they can be, for example, a single target device where one of the processors acts as the separate scalar processor.
In order to allow the various processors in a parallel processing environment to effectively communicate with one another to solve a problem, the importance of software to facilitate this communication cannot be overestimated. This is because it is the software that divides the problem into various portions, and then puts together a final result of each processor.
One way in which tasks might be divided so that they can be resolved in parallel is shown with regard to FIG. 1A. Referring now to FIG. 1A, two arrays (also called "parallel variables") are shown as variable A and variable B. One way that the tasks might be divided is along the lines of the indicies. Thus, one processor will be responsible for handling the contents of the arrays having an index of 1, another processor would be responsible for handling the contents of arrays having an index of 2, etc. This is indicated by the dotted lines in FIG. 1A. Thus, if A and B were added together and the result put into a parallel variable C, then A(1) will be added to B(1) by processor 1 and the result would be put into C(1). Similarly, A(2) would be added to B(2) by a second processor, etc. In this way, tasks can be broken up for each processor.
The situation becomes more complicated when an operation is to be performed that requires one processor to communicate with another processor. For example, if an answer is desired for C(i)=A(i)+B(i+1) where i=1 to 7, then communication will be required among the different processors to resolve the equation. Based upon the way that the tasks would be divided in FIG. 1A, this equation is not as quickly resolvable as the one in the previous example.
One way to minimize communication in situations where the processors need to communicate with one another is to change the boundaries along which tasks are created. For example, one could A(1) and B(2) to one processor, A(2) and B(3) to another processor, etc. However, this would create considerable overhead to constantly change these boundaries in accordance with the problem being solved and that would result in a slower system.
It is noted that an entity (e.g., and equation or "statement") which utilize parallel variables and does not require communication among processors is termed "elemental," entities which utilize parallel variables and do require communication among processors are termed "non-elemental" and entities which do not utilize parallel variables are termed "scalar."
Thus, it can be seen that non-elemental entities (or portions thereof) are not as quickly executable as elemental ones, since communication between processors is necessary. Thus, what is needed is a scheme to minimize the effect of the non-elemental entites, while maximizing the efficiencies of elemental ones.
In addition to the problems of non-elemental statements, conventional technology generates a single assembly language code stream from source code for using both the parallel and scalar portions of a target computer. This creates difficulties, since a single mechanism needs to evaluate each line of this assembly language code stream to determine whether it is scalar or parallel. Thus, what is also needed is some mechanism to alleviate this difficulty.