As processors run at faster speeds, memory latency on accesses to memory looms as a large problem. Commercially available microprocessors have addressed this problem by decoupling address computation of a memory reference from the memory reference itself. In addition, the processors decouple memory references from execution based on those references.
The memory latency problem is even more critical when it comes to vector processing. Vector processors often transfer large amounts of data between memory and processor. In addition, each vector processing node typically has two or more processing units. One of the units is typically a scalar unit. Another unit is a vector execution unit. In the past, the scalar, vector load/store and vector execution units were coupled together in order to avoid memory conflicts between the units. It has been, therefore, difficult to extend the decoupling mechanisms of the commercially available microprocessors to vector processing computers.
What is needed is a system and method for hiding memory latency in a vector processor that limits the coupling between the scalar, vector load/store and vector execution units.