1. Technical Field
The present invention relates generally to the field of memories for digital computers, and more particularly to the fetching of vector data from memory. Specifically, the present invention relates to the use of a cache memory in connection with a vector processor.
2. Background Art
Throughout the evolution of the digital computer, the cost of memory has been a major factor in the overall system price. Although memories have become relatively inexpensive, there are available memories with varying storage capacities and performance characteristics available at different costs. It is desirable to use the fastest memory available so as not to limit the execution speed of the data processor. The fastest memory available, however, is the most expensive, and most applications cannot justify the cost of a high speed memory of sufficient capacity to hold a complete program and its associated data. Therefore, the memory of a typical digital computer includes a relatively small portion of high-speed memory for frequent access by the data processor, and a relatively large portion of slower-speed memory for less frequent access. The slower-speed memory serves as the "main memory" of the computer.
In the typical computer as described above, the user could be permitted to load the high-speed memory with the most frequently accessed code and data, and if necessary transfer code and data between the high-speed memory and main memory during execution, to obtain the optimum execution speed for the particular program. A major advance occurred, however, when it was discovered that near-optimum performance will usually result if a certain sized block of data or code is automatically transferred from the main memory to the high-speed memory whenever the processing unit references a piece of data or code that is not currently stored in the high-speed memory. The near optimum performance is due to the principle of locality of memory references in time and space; there is a relatively high probability that subsequent program or data references will occur at a memory address within the same block of addresses as the previous program or data reference. For automatic transfer of data and program blocks, the high-speed memory is organized as an associative memory called the "cache." The configuration and operation of the cache is further described in Chapter 11 of Levy and Eckhouse, Jr., Computer Programming and Architecture--The VAX-11, Digital Equipment Corporation (1980), pp. 351-368.
Vector processing is an application in which the use of a cache has provided only marginal benefits. In a vector processor, an arithmetic unit and associated registers are provided for accelerating repetitive operations on sequential data elements. Vector processing is disclosed, of example, in Cray, Jr., U.S. Pat. No. 4,128,880; Chen et al., U.S. Pat. No. 4,636,942; and Chen et al., U.S. Pat. No. 4,661,900. The vector data elements are stored in relatively large data structures in main memory. These structures often exceed the size of the cache. In vector processing, the structures are usually accessed in a linear, rather than clustered manner, and the common caching algorithms will not work well. Due to these considerations, many vector processors either do not include a cache, or they do not cache vector data. An example of the first instance is the CRAY-1 computer, in which the entire memory system is constructed of the fastest available memory. In the second instance, high-speed memory is provided for storing vector data, and separate memory and an associated cache are provided for storing the program.
It would be desirable to use a typical cache and main memory in association with a vector processor. In such a case, the cache and memory could be used for both vector processing and scalar processing; the vector processor and the scalar processor could then share the same cache and main memory. A typical cache and main memory, however, is designed for scalar processing, and therefore has a relatively large block size, and does not handle multiple independent or interleaved memory requests. But for vectors, it is difficult to select a block size that is suited to the various ways in which the vector elements are dispersed in the memory space. Although a large block size would more often include most of the vector elements so as to reduce the frequency of fetching blocks from the main memory, the time required for each fetch increases with increased block size. In addition, the vectors may be stored in non-contiguous locations so that the vector will be stored in many blocks and there could be only one vector element per block fetched. Therefore, the cache cannot be made to work well for vectors merely by attempting to select an optimum block size.