Many data processing tasks involve extensive arithmetic manipulation of ordered arrays of data. Commonly, this type of manipulation or "vector" processing involves performing the same operation repetitively on each successive element of a set of data. In order to increase processing speed and hardware efficiency when dealing with ordered arrays of data, vector computing machines have been developed. A vector machine is one which deals with ordered arrays of data by virtue of its hardware organization, thus attaining a higher speed of operation than scalar machines.
Computer processing speed and efficiency in both scalar and vector machines can be further increased through the use of multiprocessing techniques. Multiprocessing involves the use of several hundreds or thousands of processors sharing system resources, such as main memory. Independent tasks of different jobs or related tasks of a single job may be run on the multiple processors. Each processor obeys its own set of instructions, and the processors execute their instructions in parallel. By increasing the number of processors and operating them in parallel, more work can be done in a shorter period of time.
Although multiprocessing can increase performance speed, the increase is not linearly related to the number of processors employed. This is largely due to two factors: overhead and lockout. Significant overhead is introduced in a multiprocessor environment because of the increased level of control and synchronization required to coordinate the processors and processor functions. Communication between and control of all the processors introduces performance degradation into multiprocessing systems. When several processors are cooperating to perform a task, data dependencies and the passing of data between processors are inevitable. Processor idle time is introduced when one processor must wait for data to be passed to it from another processor. This processor idle time results in a reduction in system performance.
The other significant cause of multiprocessor system degradation is processor lockout, or blocking, associated with multiple processors sharing common resources. This occurs when one processor attempts to access a shared resource, such as shared memory, which another processor is already using. The processor is thus blocked from using the shared resource and must wait until the other processor is finished. Again, processor idle time occurs and system performance is reduced.
Closely tied to the concepts of overhead and lockout and which also effects overall machine performance is the processor to memory interface. One example of a multiprocessor interface can be found in the Monarch parallel multiprocessing computer, designed by BBN Systems and Technologies Corporation. The Monarch is a scalar, single threaded multiprocessing architecture, which uses a circuit switching technique to communicate between processors and memory. According to this circuit switching technique, all processors share the same path to memory. When a processor in the Monarch design has a memory request, the entire path from the processor network to memory is opened up and is kept open until the communication between the memory and the processor is completed. This scheme can choke off other processors which are attempting to reference memory through the circuit switching network, limiting the reference transfer rate and resulting in a high amount of processor idle time. Such a design is therefore not practical for use in multiprocessor, multithreaded, vector processing in which an inherently large volume of data must be passed between the processors and the memory.
Another example of a multiprocessor memory interface can be found in the HORIZON routing scheme. The HORIZON interface network uses a scheme called desperation routing, or hot potato routing. HORIZON's desperation routing is a multi-stage network which has multiple inputs and the equivalent measure of outputs. This routing scheme requires that every input is routed to an output every network cycle. For example, if there are four input references, and each of the four input references wants to go to the same output, one of the four input references goes to the right output and all the other inputs go to some other, undesired output. This means that three out of the four inputs take a much longer path through the network. The HORIZON desperation network is routed in such a fashion that these other three references will eventually come back to the desired input and have another opportunity to get to the desired output. So that references will not be forever lost in the network, the HORIZON routing scheme has a mechanism such that references that have been in the network the longest have the highest priority so that they will eventually win out over contending references for the same output. Those skilled in the art will readily recognize that such a routing scheme results in a single reference having multiple possible routes to the desired end point, and that many references can spend a very long period of time fighting traffic in the network before they arrive at their destination. Thus, the HORIZON desperation routing scheme is also not desirable for use in multiprocessing machines.
Another important concept in multiprocessing systems is the concept of scalability. Scalability refers to the ability of a system to be scaled to a variety of different sizes to meet the needs of different users. For example, while a full blown system may have 1024 processors it is desirable to make scaled down versions available with 512 processors, 256 processors or some other configuration. It is important that the basic building blocks which make up the biggest system can be used without modification to create the smallest, and vice versa. A scalable system is therefore far more flexible and such systems can be expanded to meet a users changing needs.
Therefore, there is a need in the art for a processor to memory interconnect network which, among other things, allows the processors to issue memory references without contention, which reduces contention among memory references in the network, which reduces the amount of time any one reference spends in the network, resulting in decreased processor idle time and increased system performance. There is also a need for a modular interconnect network that is easily scalable to fit multiprocessing systems having any number of processors and differently sized memories without the need for redesign of the individual modules which make up the interconnect network.