Memories interleaved for parallel access are arranged in independently accessible banks. Address distribution techniques for parallel access to data at some number of addresses require that the number of banks in a paged memory be equal to some multiple, M, of H to some power, M·HB where M>0, B>1 and H is the number of logic states (H=2 in binary). For each of M wings, the number of banks commonly interleaved for some number of pages, and thus equally used by each page, is equal to 2B. Typically, an interleaved memory contains enough banks for one or more complete interleaves.
If each memory bank contains 2C data cells, as is required for physical memory to be of contiguous data addresses, each interleave contains 2B+C data cells, where C>>B. Therefore, the number of data addresses to address each data cell within each wing is 2B+C.
Each data address contains different bit ranges of select values according to different types and/or uses of memory parts selected. Each wing is selected using a wing select value from the series 0, 1, . . . , M−1. Each bank of a selected wing is selected using a bank select value from the series 0, 1, . . . , 2B−1. Each cell of a selected bank is selected using a cell select value from the series 0, 1, . . . , 2C−1.
A classic interleave scheme, such as presently used in most interleaved memory systems, distributes all 2B+C addresses among 2B banks at the rate of one address per bank, repeating a simple pattern for 2C repeats. Thus, the classic interleave scheme employs a power-of-two number of data banks for each interleave repeat. However, if all non-unitary prime factors of the number of repeated data banks is 2, as in the case of a classic interleave scheme employing 2B banks for each wing, the efficiency of the interleave scheme is significantly less than a scheme interleaving a plurality of data banks where the repeated number of banks has a non-unitary odd prime factor (i.e., an odd number of repeated data banks greater than 2).
One interleave scheme employing an odd number of data banks greater than 2 is the Ranade interleave, which distributes every set of 2B−1 addresses of (2B−1)2C total addresses among 2B−1 banks at the rate of one address per bank for each of 2C repeats of the pattern. The Ranade scheme employs an odd number of data banks (2B−1) for each interleave repeat. A Ranade interleave scheme employing seven repeated data banks has a efficiency of 87.8%, meaning that 87.8% of the banks, on the average, hold the data of a large range of fixed stride values. By comparison, a classic interleave scheme employing eight repeated data banks has an efficiency of 67.2%.
One problem with the Ranade scheme is that it can only be applied to unpaged memories having either a single interleave for a single wing of banks or a discontinuous physical address space. Thus, one complete Ranade interleave of a contiguous range of data addresses requires use of all the banks of an entire memory. Moreover, the Ranade scheme is limited to memories having 2B−1 banks and cannot be used in memories having 2B banks per wing, as in paged and bit-addressed memories.
Other attempts have been made to increase the efficiency of data distribution while preserving paging. One such approach is a pseudo-random interleave scheme proposed by B. R. Rau in “Pseudo-Randomly Interleaved Memory,” Proceedings of the Association for Computer Machinery, September 1991. The Rau interleave employs a tree composed of Exclusive-OR gates (XORs) to distribute addresses in a pseudo-random fashion. Yet Rau's use of XORs introduces latency due to the serial computation of bank selects and virtual-to-physical address association. Pseudo-random interleaves, such as Rau's, has an efficiency of about 66% for distributing data of a fixed stride value among eight banks.
Rau also describes a “prime-degree” interleave scheme that is used with an other than power of two number of memory banks (specifically a non-unitary odd prime number of banks). But the use of an odd number of memory banks in the prime-degree interleave scheme described by Rau is unsuitable for paged memories because paged and bit-addressed memories require a power-of-two number of memory banks per wing.
When dealing with parallel high-rate-streaming of multiple vectors of memory data, prior memory schemes using a single serial (e.g. one address of one vector at a time) data-address stream required that all elemental data of each vector be of consecutive memory addresses and all streams of all vectors (e.g. one stream per vector) flow in the same direction (e.g. addressing smaller then larger addresses of data of each vector) and at the same basic data rate (e.g. number of data bits per unit of time). For a single such address stream to service multiple such data streams for such streaming of their data uninterrupted, a single address value of a single data stream addressed a single block of data with enough consecutive elemental data to satisfy its streaming at the full rate for a number of consecutive address cycles not needed by it. Like sized data blocks (e.g. total bits of data per block) for other vectors were addressed serially using only the one address stream. Thus, for all the data of each data block of each stream and thus each vector was necessarily consecutive.
Where the memory was banked and interleaved, each bank (or each so dedicated set of banks) was capable of accessing a streaming sized block of data when accessed by a single address value of all serial from the address stream. Also, a single bank was not capable of being accessed more often than the time to serially address a number of other such banks. Finally, there were enough banks interleaved to allow each bank to be sequentially addressed for accessing data of each flowing stream. Thus and as for each bank, every stream accesses once before the same stream accesses again, all accessing in an order determined per case but uniquely maintained for at least most of the duration of each case. Thus and finally, the one dominate order per case being necessarily maintained did not allow two directions of addressing for the same case.
To allow the time separation between each stream's accessing of the same bank for the worst case, large data buffers were required by each data stream because as measured in time, one data stream must be allowed to stream ahead of all others and all the variously leading read data streams and trailing write data streams must be buffered until the last data stream began streaming data to memory. Buffers are costly both in computer resources and delay respective of size. Also, it was necessary that all data streams' addressing of memory be in the same direction and at the same full data rate so those streaming at full rate would not experience continual bank usage conflicts resulting in interruptions of data flow. There is a need, therefore, for a vector streaming technique wherein vector data are pre-read and post-written from and to the memory so that vector data sees minimal delay using minimal buffers while moving the data at the rates, directions and fixed spacings of demand of the processor unit.