The invention relates generally to computer architecture methods and apparatus and in particular to the bus structure of a high speed parallel processing computer method and apparatus.
Substantial improvements in processing time have been made during the past few years in computer technology. The advances have come both from improved semiconductor technology, which allows a higher level of integration, greater flexibility, and increased speed, and in computer architectures which provide, for example, parallel processing. Parallel processing derives its substantial speed advantage by spreading a computation over a plurality of computational elements. The number of elements can vary significantly depending upon the problem being solved as well as the computer interconnection hardware. A particular limitation, however, with respect to parallel processors, is that their performance reaches a maximum only when the data which they process can be arranged into data sets which are independent of each other and which can then be processed simultaneously. Machines which exhibit this characteristic have often been termed single-instruction-multiple-data (SIMD) machines or, more commonly, vector machines.
Another approach has been to provide multiple-instruction-multiple-data (MIMD) machines, often called multiprocessors. In this architecture, processing units execute different instructions simultaneously and each processor executes an instruction which does not depend upon the results of another processor operating on a different instruction in the same machine cycle. Accordingly, the performance of the MIMD machine is program-dependent and the machines are fastest when the program can be broken into multiple discrete self-contained tasks.
It has been found that both vector machines and multiprocessors are best suited to specialized environments. Further, certain programs can be fine-tuned toward the strengths and away from the weaknesses of a particular machine. However, such manipulation of code is performed by hand and is generally too costly and difficult to implement. And, even if a computer programmer can optimize his program by creating large data vectors, or by giving it so-called "coarse-grained" parallelism, he is still faced with certain parts of the code that cannot be made to execute faster. That so-called "junk code" takes up almost half of the execution time for a typical program. Thus the increase in speed due to very fast semiconductor circuits and specialized architecture tends to be overshadowed by "junk code" which executes at a relatively slow pace.
Apparatus for overcoming this traditional processing bottleneck has been the subject of much investigation. These investigations have been directed to so-called "fifth generation" computers. One particularly advantageous method relates to the use of a reduced instruction set consisting primarily of basic level register and memory operations and related computational hardware that maintains a relatively simple configuration. A particularly useful method has been referred to as a Trace Scheduling computer and relates to a data processor whose compiler generates source code that makes the best use of the memory and computational resources which are available to it. In this way execution time for both loop-oriented, easily paralleled code and for "junk code" that appears frequently, especially in scientific code, is reduced significantly over the traditional as well as modern data dependent or code dependent design strategies.
Even the Trace Scheduling computer, however, must have an architecture which allows it to take full advantage of the Trace Scheduling philosophy and concept.
Accordingly, a primary object of the invention is a computer architecture which enables a Trace Scheduling computer to operate near its theoretical high capacity and reliability. Other objects of the invention are a computing method and apparatus for increasing the throughput of a data processing apparatus, for providing extensive and useful high bandwidth data paths between components of the computing apparatus, for enabling a high speed refilling of instruction cache when an instruction cache miss occurs, and for providing high bandwidth memory access in a parallel computing environment.