1. Field of the Invention
The invention relates generally to computer systems, and more particularly to resolving bank conflicts in a shared interleaved memory.
2. Description of Related Art
Without limiting the scope of the invention, this background information is provided in the context of a specific problem to which the invention has application.
Shared memory organization principally takes on two forms namely; multiple (dual) ported or interleaved single ported. It is well known that dual ported memory devices carry with it an overhead in the form of increased size, cost, and power consumption in exchange for a slight performance increase over single ported interleaved memory. On the other hand, interleaved memory organization while preferable from a size, cost, and power consumption standpoint, typically requires elaborate bank conflict resolution circuitry--complicating speed path issues in the data and control paths.
By way of background, a typical shared interleaved memory is organized with "M" memory banks, each bank having a width of "W" bytes. The values for M and W are based for the most part, on the data bus width, the nominal length of the operand data stored therein, and as discussed hereinbelow, complexity of bank conflict resolution logic. For example in a variable length instruction machine, an instruction cache memory should preferably have a minimum bank width W equal to the length of its smallest instruction. Thus, operand fetches from the instruction cache would be efficient since the minimum "slice" would equal the smallest length instruction, however in some cases, an operand fetch would require access to two or more memory banks. On the other hand, if the bank width were made bigger, less efficient operation would occur since larger chunks of operands would be fetched even if all were not needed. Accordingly, it can be seen that the more numerous and the narrower the memory banks, the more plentiful and granular the selection, and consequently, the less likelihood of a conflict between multiple independent accesses.
An example of multiple independent accesses in a scalar processor would be concurrent data and instruction accesses (data read, instruction fetch, or data write) to a unified cache. In a superscalar processor, an example of multiple independent accesses could be two or more contemporaneous independent data reads to the unified cache.
One disadvantage of employing a large number of narrow banks in a shared interleaved memory is that it complicates the bank conflict resolution circuitry. Specifically, the bank conflict resolution circuitry must take into account not only the addresses and the address size sought by multiple accesses, but also the size of the operand (byte, word, double-word, etc. ) in determining which and what number of banks are required for the memory access.
More importantly however, the bank conflict resolution circuitry can significantly induce delay into the data path and skew the control and data paths. More specifically, in resolving bank conflicts and determining priority, prior art approaches have typically generated some sort of "overlap" signal by comparing each and every bank request from each and every independent requester and logically ORing the results of each comparison to form a single overlap signal. If there is an overlap, the overlap signal is used by prioritizing logic to exclusively grant all banks to the highest priority requester and to inhibit the lower priority requester(s).
By way of further background, it is well accepted practice that a standard "single gate delay" logic circuit typically can only accept three to five inputs while maintaining safe operating parameters. Accordingly, if it is necessary to compare contention among many banks in a multiple banked interleaved memory, the requisite logic necessary to generate the overlap signal will have multiple levels of gates adding delay to the data path. It can be seen for example, that if the memory is interleaved into sixteen banks, the logical ORing which forms the single overlap signal would induce at least two gate delays into the data path. In the past, in order to avoid significant data path delays, tradeoffs were made to limit the number of banks thus increasing the width of each bank--decreasing the resolution, and increasing the probability of a conflict between two independent accesses.
From the foregoing, it can be seen that there is a need for a shared interleaved memory system having a relatively large number of banks and associated conflict resolution circuitry without inducing significant delay into the data path.