Many computer systems exhibit a propensity to access instructions and data from selected regions of memory in a highly sequential order. If a memory boundary should fall within such a region resulting in data wrap-around from the highest memory address to the lowest, the natural sequence of addressing by increments is destroyed. Thus, the accessing of the upper address followed by the lowest address requires the introduction of special conditions to the memory accessing procedure causing decreased efficiency in the utilization of memory.
For example, modern microprocessor chips often contain an on-chip cache memory for instruction and data in order to increase the effective memory bandwidth of a bulk storage main memory system.
The increase in apparent bandwidth of the slower bulk storage mean memory is achieved by taking advantage of a characteristic that is generally found in instruction programs, namely: most references to memory, particularly over a short interval of time, tend to access certain memory neighborhoods of instruction memory. These neighborhoods, or reference localities, tend to move about as the program progresses. By storing these reference localities in a fast cache memory, the apparent speed (bandwidth) of memory can be increased by virtue of the reduced access time. From time to time, new reference localities must be transferred from bulk storage to cache as the pertinent reference locality shifts in the bulk storage main memory or when jumps must be made to accommodate subroutines, for example. Whenever, a required instruction program segment is not found in cache memory, an access must be made to main memory and the pertinent data transferred to cache. This new pertinent data generally displaces an older segment.
FIG. 1 shows a typical memory map from a main memory to a cache memory as used in a virtual memory system. FIG. 1, portion a, shows the accessible bulk memory segmented into 2.sup.N pages. Each page may typically be 2 or 4 B (bytes) wide and 1 or 2 KB deep. The relative beginning address of each page is indicated to the left.
FIG. 1, portion b, shows a cache memory map for a cache memory having a capacity to store an integer number of pages, in which pages m and m+1 are stored in the page locations indicated by the solid arrows emanating from main memory. The two pages, shown as being transferred to contiguous locations in cache memory, in the general case, may be stored in any order. However, it is not uncommon for the reference locality to be located in adjacent pages as indicated by the shaded regions of page m and m+1 so that it is desirable to store contiguous main memory pages in contiguous segments of cache memory in order to simplify addressing of cache memory by its associated microprocessor (CPU). FIG. 1, portion c, shows a common situation in which page m is stored at the upper edge of cache memory and consequently page m+1 is "rolled-over" into lower memory as shown by the dash lines emanating from the main memory of FIG. 1, portion b, to cache memory of FIG. 1, portion c. If this situation occurs, addressing problems arise because of the discontinuity in addresses within the current reference locality. This situation is particularly undesirable when split-wordline read, rather than OR-line read is used as an accessing scheme to cache memory.
FIG. 2 shows a typical 2-bank RAM. Bank-0 comprises a memory unit 20, m-write amplifier unit 22, and m-sense amplifier unit 24. Similarly bank-1 comprises memory unit 21, m write amplifier unit 23, and m-sense amplifier unit 25. Each memory unit (20, 21) has an m-bit wide line and N=2.sup.n lines. The memory units are addressed by an n-bit code (A.sub.n-1 A.sub.n-2 . . . A.sub.0) through line select decoder 10 which has N output lines. The input line code (A.sub.n-1 A.sub.n-2 . . . A.sub.0) designates one-out-of-N of the output lines to be activated, so that the activated line enables a read or write operation to the corresponding line. In this two-bank configuration, a read causes sense amplifier 24 and 25 to read a line from bank-0 and bank-1, respectively, so that either or both outputs may be used.
FIG. 3(a) and (b) shows the logical configuration of a typical line select decoder 10. In FIG. 3(a), input line code terminals (A.sub.n-1 A.sub.n-2 . . . A.sub.0) apply the input line code to a logic array comprising N multiple-input AND-gates, one for each output lines labeled 0 through 2.sup.n -1. Each AND-gate input 12, designated in FIG. 3 by the x-symbol, has one input for every bit of input line code. Inverter 11 is used to generate the logical complement of each input line code bit. A particular line with binary code index of (a.sub.n-1 a.sub.n-2 . . . a.sub.0), where an is a binary bit, is selected to be active by applying a corresponding binary code to input terminals A.sub.n-1 A.sub.n-2 . . . A.sub.0 so that line a.sub.n-1 a.sub.n-2 . . . a.sub.0 is true high, if, any only if code a.sub.n-1 a.sub.n-2 . . . a.sub.0 is applied. For example, line 6 (binary 0 . . . 0110) is active when the input code at A.sub.n-1 A.sub.n-2 . . . A.sub.0 is 0 . . . 0110.
FIG. 3(b) shows a 3-bit line decoder 10 with the multiple input AND-gate 27 shown explicitly for producing 8 fully decoded output lines. Another useful mode for operating a memory is called the split-line mode. To explain the difference between OR-line and split-line modes, refer to FIG. 2. If the input code corresponds to line i, and the memory is in the read mode, line-select decoder 10, activates line causing line i to be read from each bank as described above. However, if the memory is being operated as an instruction cache, the split-line operating mode provides a means for reading data from, say, line i of bank-0 and line i+1 of bank-1 with a minimum delay. This mode is important because of the important property of locality of memory neighborhoods.
One obvious way to implement a split-line read is to address line i and read, say, bank-0 and then increment the input line-code by one, and then read line i+1 from bank-1. The problem with this method is that the access requires sequential addressing and an incrementing or add-operation with the associated inherent propagation delay due to carries.
In order to achieve maximum effective bandwidth of the memory, a line address decoder of the type shown in FIG. 4 has been used. (It should be noted that in the following example, the memory is limited to 3 bits, or N=8 decoded lines for the sake of clarity in explanation and does not imply a limitation as to size. Extension of the principles to N=8 will be apparent to those practicing the art.) The line decoder 40 is the same as shown previously as decoder 10 of FIG. 3(a) and (b) except that the decoded output lines 0-7 are further operated-on by split line selector 45 which comprises one 2-input selector 13 per output line. One input is connected to the output line, say i of decoder 10, normally associated with its output terminal i, while the other is connected to line i-1 of decoder 10. The output is selected by control line 41 which connects to lower input of each selector 13 when in one state and to the upper input of each selector 13 when the other state. The lower input of the line 0 selector is connected to the i-1=N-1 line (or 7 line in the example shown). Thus, these connections effectively arrange the output to correspond to a modulo-N ring counter by causing (N-1)+1 to be equal to N mod N. The disadvantage of decoder 40 is that for large memory arrays having in the order of 2K lines (or more), the circuit layout required to accommodate the connection from line N-1 of decoder 10 to the lower input of selector 13 associated with line 0 introduces an undesirably long lead and causes a loss in effective bandwidth as a result. The present invention is directed to providing both OR-line and split-line cache memory operation without the limitations described above.