Distributed computer system designs based on clusters of relatively inexpensive microprocessors have become popular. However, there is still a need for Vector Processing Computer Systems that are able to handle calculation-intensive problems on a large amount of data. Traditional vector systems do not scale to a large number of processors due to their system architectures. Previous vector machines tended to have a limited number of processors clustered around a shared memory. The shared memory was developed to minimize communication costs when sharing data between processors.
Microprocessor-based machines on the other hand, suffer from limitations in the number of outstanding memory references they can handle. This makes it difficult for microprocessor-based machines to tolerate high memory access latencies. In addition, microprocessor-based machines use a memory subsystem based on cache line granularity, which is inefficient when accessing single words. What is needed is a computer system structure scalable to a large number of processors yet can tolerate hundreds of outstanding memory references.