1. Field of the Invention
The present invention relates to information processing systems, particularly high performance systems using plural processors that can function simultaneously to execute a common program. The invention also relates particularly to a central processing unit architecture that can serve as the vector unit of a vector processor.
2. Background Discussion
To increase the performance of large scientific computers, the number of processors has had to be increased so that they can work simultaneously. This method, called "parallelization," theoretically makes it possible to achieve a total cycle time equal to the cycle time of a basic processor divided by the number of processors in the system.
Another way of cutting the cycle time is to use several basic processors organized in a "pipeline."
In reality, the performance of a vector processor also depends on the compiler's vectorization rate for a given application. This problem basically involves programming and compiling techniques and, consequently, goes beyond the bounds of this invention. In the following discussion, we shall therefore disregard this question and concern ourselves basically with the physical architecture.
The performance of a system also depends on the performance of the memories with which the processors communicate. The performance of a memory or a group of memories is defined by its access time and its cycle time. The access time is defined as the period of time between the sending of a request by one of the processors and the appearance of an acknowledgement signal indicating that the request has been retrieved by the memory and a new request can be sent. The cycle time defines the period of time between when a request is received by the memory and the time when the response is available in the memory's output register.
Current developments in the design of large computers have made it necessary to have memories with increasingly large capacities. However, the memories associated with the processors also have to have performances compatible with those of the processors. We are therefore trying to design memories with the shortest possible access and cycle time. For this, one standard solution is to use a memory made up of several modules and to have interleaved addressing of those modules With this interleaving technique, successive or simultaneous requests sent by the processors are addressed successively or simultaneously to different modules in the memory.
Using the pipeline and interleaving techniques, the cycle times of the processors and the memories have been reduced.
To increase further the performance of the system in a vector mode, more parallelism of its function is being sought. For this, a number of pipeline type processors and a number of interleaved memories are used. However, the implementation of such a system takes into account the very low cycle time of the memories and the processors. One problem, in particular, is connecting the processors and the memories.
The solution of a bus shared by the processors and the memories prohibits simultaneous exchanges between several processors and several memories. The bus is thus not adapted for parallel functioning. The solution used most is to provide a crossbar-type interconnection network or derivative that makes those simultaneous connections possible. This solution is, however, limited by the increase in complexity of the interconnection device when the cycle time decreases and when the rate of parallelization increases. Indeed, a crossbar network implies a centralization of paths which has the following unfavorable effects:
increasing the connecting means, even more so when the paths are wide;
lengthening the connections when the size of the memories increases, with an unfavorable effect on thruputs and access times;
difficulty in integrating since the percentage of connection is high in relation to the associated logic functions;
need for centralized command management, which entails difficulties in the management of data flows and conflicts;
absence of modularity;
difficulty in using redundancies to permit reconfigurations.