1. Field of the Invention.
This invention relates generally to computer architecture and more particularly to computers having multiple memory modules which are accessed in parallel to store and retrieve elements of a common data structure. The invention specifically relates to a dynamic storage scheme for such a computer.
2. Description of the Related Art.
The performance of a heavily pipelined vector processor is highly dependent on the speed at which data can be transferred between the vector processor and main memory. Because processor speeds are substantially higher than memory speeds, it is necessary to use memory systems having multiple memory modules that are accessed in parallel to provide a data transfer rate that matches the processor speed.
To permit the parallel access to the memory modules, the vectors are stored in respective regions of memory address space that are allocated among the multiple memory modules. In particular, the mapping of the memory addresses referenced by the processor to the physical storage locations of the memory modules is often referred to as the "storage scheme" of the system.
For the storage scheme to be effective in providing a high data transfer rate, successive memory addresses referenced by the processor must be mapped to different ones of the memory modules. Otherwise, a memory module will be busy storing or retrieving data for a previous reference when the processor references the subsequent memory address. This undesirable condition is known generally as a "memory conflict" and more specifically as a "collision".
It is not possible to entirely eliminate memory conflict by the judicious selection of a single storage scheme unless the freedom of the vector processor to reference the memory address space is also limited. In a typical system, the vector processor references successive memory addresses that differ by an amount S known as the "stride" of the memory access. In other words, the vector processor references the successive addresses a.sub.o, a.sub.o +S, a.sub.o +2S, a.sub.o +3S, . . . . If the storage scheme is fixed and the processor is permitted to access the memory without limitation as to the stride, then there is always a possibility of a memory conflict occurring when the stride happens to coincide with a difference between two addresses mapped to the same memory module.
The simplest storage scheme is known as the "low-order interleaved" scheme which maps the address a into the memory module a mod N, where N is the number of memory modules in the system. In such a storage scheme, however, memory collisions will occur whenever the stride S of the memory access is divisible by N, because the successive memory references will cause only one memory module to be accessed.
The performance of a system using the "low-order interleaved" scheme is also degraded whenever the stride S and the number of memory modules N have a common factor greater than 1 (i.e., S and N are not relatively prime), because in this case some of the memory modules will not be accessed at all. In particular, if N is prime, access is conflict-free for a larger set of strides than when N is composite. The use of a prime number of modules, however, has several disadvantages, the primary one being that the address to storage location mapping becomes computationally expensive.
For any storage scheme, the average data transfer rate between the processor and main memory cannot exceed the rate of a single module multiplied by the number of modules that are actually accessed during the data transfer. Therefore, to achieve memory access at the maximum possible data transfer rate for vectors having a length well in excess of N, it is necessary for all N of the memory modules to be accessed during each memory cycle except at the beginning and end of data transfer. By definition, this condition is known as "conflict-free" memory access.
For any storage scheme, it is desirable to provide conflict-free access for a stride of 1. Access having a stride of 1 is known as "sequential access" and it is typically used to initially load vector data into the main memory. Once loaded, the data are typically accessed at a different stride in addition to a stride of 1.
Matrix operations, for example, have been performed by sequential access of main memory to store the matrix elements by row. In other words, for a pXq matrix A(i,j), the matrix has its respective elements stored at sequential memory addresses a.sub.ij =a.sub.oo +q(i-1)+(j-1). After the memory is loaded, it can be accessed sequentially to obtain a selected row vector A.sub.r (i) having its elements at successive addresses a.sub.oo +q(i-1), a.sub.oo +q(i-1)+1, a.sub.oo +q(i-1)+2, . . . , a.sub.oo +q(i-1)+(q-1). But it is also desirable to access memory with a stride of S=q to obtain a selected column vector A.sub.c (j) having its elements at successive addresses a.sub.oo +(j-1), a.sub.oo +q+(j-1), a.sub.oo +2q+(j-1), . . . , a.sub.oo +q(p-1)+(j-1).
The problem of providing a storage scheme that is conflict-free for sequential access and conflict-free most of the time for a specified stride S is the subject of Kogge U.S. Pat. No. 4,370,732. In col. 3, line 30 to col. 4, line 60, Kogge teaches that to facilitate row and column access of a pXq matrix stored in an N-way interleaved memory, the address of each element in the ith row of the matrix should be circularly shifted by s(i-1) positions, where "s" is a "skew factor". The circularly shifted rows are then stored in memory by row as before. For a pXq matrix with a skew factor s, element A(i,j) is stored at memory location: EQU a.sub.ij =a.sub.oo +q(i-1)+(j-1+(i-1)s)mod q
Kogge says that the value of s to pick should be such that when accessing a column, any group of N consecutive column elements fall in N different memory modules. Kogge further says that when N is a power of 2, one need pick s as 0 or 1 depending on whether q is odd or even respectively to begin with.
For the storage scheme of Kogge wherein N is a power of 2, column access having a stride of S=q will be conflict-free when q is odd because in this case s=O and the mapping is the same as for a low-order interleaving scheme and q is relatively prime with respect to N. (Note that since q is odd, it does not have a factor of 2, but since N is a power of 2, it has only factors of 2). For the case of N=4 and q being an even number, Kogge's storage scheme is conflict-free for q=2, q=4 and q=6, but not for q=8.
As Kogge acknowledges in col. 2, lines 56-60, his storage scheme is a "skewing scheme" based on the work of P. Budnik and D. J. Kuck as described in Budnick and Kuck, "The Organization and Use of Parallel Memories," IEEE Transactions on Computers C-20(12) pp. 1566-1569 (December 1971). Skewing schemes are further described in H. D. Shapiro, "Theoretical Limitations on the Efficient Use of Parallel Memories," IEEE Transactions on Computers C-27(5) pp. 421-428 (May 1978); H. Wijshoff and J. van Leeuwen, "The Structure of Periodic Storage Schemes for Parallel Memories," IEEE Transactions on Computers C-34(6) pp. 501-505 (June 1985); and H. Wijshoff and J. van Leeuwen, "On Linear Skewing Schemes and d-Ordered Vectors," IEEE Transactions on Computers C-36(2) pp. 233-239 (February 1987). For a system where the number of memory modules N is a power of 2, however, no single storage scheme has been found which allows conflict-free access to vectors for all strides.