This invention relates generally to random access memory circuits, and, more particularly, to random access memory circuits for use in high-speed computers employing a parallel processing architecture, or employing concurrent processing techniques usually referred to as pipelining. Although there may be other practical uses for the memory device of the invention, it will be described here in relation to its application in parallel and concurrent processors.
Modern parallel processors have a number of processing elements, such as arithmetic multipliers and adders, which can operate simultaneously. Such computers are useful in addressing certain classes of computational problems, for the most part in scientific applications, which could not be solved fast enough using a more conventional serial processor, in which each arithmetic or logical operation had to be performed. A serial computer is typically controlled by a program of which each instruction is stored in a single "word" of memory. Since there is only one processing element in a serial computer, it is logical to store a program in this manner. The computer retrieves the program word-by-word and thus executes the program of instructions.
By way of contrast, the parallel processor in general can handle at one time as many instructions as there are processing elements. The instructions are usually stored and retrieved in a single, relatively wide word. Hence the term "horizontal" processor. It can readily be understood that, depending on the complexity of the computational problem to be addressed, the preparation of an efficient program for such a computer may be a very difficult task. For a serial computer, programming is relatively straightforward, since there is only one processing element to be concerned about. For the horizontal processor, the programmer's goal is not only to program the necessary steps to arrive at a desired arithmetic result, but to do so with the most efficient utilization of the parallel nature of the processing elements.
Complicating the programming task is the nature of the computational problem, which is generally not completely parallel. For example, the output from an adder may be one of two input quantities to be multiplied in a multiplier circuit, but the other required input to the multiplier may have to be derived from the adder output by first adding to it an additional quantity. Thus the first multiplier input must be stored until the second is also available. In the past, such temporary storage was provided by "scratchpad" memory devices. In this simple example, the first available input to the multiplier would be stored in a scratchpad memory, then later retrieved when the second input was available and the multiplier was also available.
Basically, then, the task of programming a horizontal processor is one of scheduling the times of operation of the available computer resources, to make the best use of those resources in a parallel manner. The computer includes not only the resources or processing elements, but an interconnection circuit, by means of which outputs from selected processing elements can be connected as inputs to other selected processing elements. When an output has to be temporarily stored in a scratchpad memory, the memory is, in effect, another computer resource, the use of which has to be scheduled by the programmer. Unfortunately, there is a large class of scientific computations for which these scheduling steps are not only non-trivial, but can be performed only on a time-consuming trial-and-error basis.
The class of computations referred to is that involving iterative techniques. Iterative computations, in which an identical computational loop is repeated many times to obtain a result, account for a major fraction of the execution time in scientific computations. Scheduling iterative computations for parallel processors is, therefore, of considerable importance. Subject to data dependencies between them, successive iterations can be scheduled in any manner that does not result in conflict for the use of the resources. One way of overlapping iterations is to use identical schedules for successive iterations, and to initiate successive iterations spaced by a fixed interval, referred to as the initiation interval.
The minimum initiation interval is the smallest initiation interval for which a schedule without conflicts can be formulated. Use of the minimum initiation interval results in optimum utilization of the computer resources. An optimum schedule can be arrived at only if the minimum initiation interval is known. This interval will depend in part on the usage that is made of scratchpad memories, since these are also computer resources. But the scratchpad usage can only be determined when a schedule is known. In practice, the apparent circularity of this problem is avoided by trial and error. The programmer selects a likely candidate for the minimum initiation interval and tries to formulate a schedule with no conflicts for the usage of resources. If a schedule cannot be found, the programmer may either keep trying, or may increase the estimated minimum initiation interval and try again, bearing in mind that a schedule using the increased interval will not be as efficient.
An important consequence of these scheduling difficulties is the lack of availability of an efficient compiler for horizontal machines. A compiler is a computer program whose only function is to translate a program written in a higher-level programming language into instructions in "machine language" for execution by a particular machine. The higher-level language is designed to be easily understood by scientists or engineers who might use it, and it requires no detailed knowledge of the machine that will be used ultimately to execute the program. In the case of horizontal machines, a compiler should be capable of performing the function of scheduling the activities of the computer resources for optimum parallel utilization. Since the scheduling task, as already discussed, has been accomplished by trial-and-error methods, an efficient compiler to achieve the same result has proved to be an elusive goal.
An important factor in efficient scheduling of operations on horizontal machines, or on machines employing pipelining principles, is the manner in which memory used for temporary storage is organized. A conventional random access register file, such as might be used for a scratchpad memory, allows writing into a specified address and reading from a specified address. Although this arrangement would seem to provide the ultimate in programming flexibility, in fact it poses a serious limitation to the production of efficient machine-language programs. For example, suppose that quantities Q1, Q2 and Q3, derived from the first, second and third iterations of a computation, are saved temporarily in a register file locations #1, #2 and #3, respectively. At some subsequent point in the computation schedule for each iteration, the quantities will be retrieved from the register file. However, one of the efficiencies of iterative processing is sacrificed, since different iterations will need different programs to be able to decide where to store the quantity, and to "remember" where to retrieve it. If three storage locations are needed to store corresponding quantities from different iterations, then three different versions of the program will be required. For example, one version would be used for the first, fourth, seventh, and so forth, iterations, and would use location #1 as the storage and retrieval address. Since the programs employ very wide instruction words, additional copies of a program are a significant demand on available program storage facilities.
Although a shift register would provide one possible solution to this specific problem, a shift register might not meet other temporary storage requirements of a program. Other memory configurations, such as a "push-down stack," in which items are stored and retrieved at the same location, are also not suitable solutions to the problem. A push-down stack operates on a last-in-first-out basis, which is not usually an appropriate requirement for iterative computation.
It will be appreciated from the foregoing that there is still a significant need for improvement in the field of random access memory devices, especially as they might be applied to computers of the horizontal type or those employing pipelining principles. The present invention fulfills this need.