A digital data processing systems, and in particular computer processors, continually press for higher operating frequencies as typically measured by clock rates. With processors having multiple execution units it is not at all unusual to have instructions executed completely with one clock cycle. In those contexts, instructions and data must be accessible at the same clock rate if processor stalls are to be avoided. In furtherance of this objection, modern computer systems have processors with cache memories, often of multiple levels, the memory designs being refined to exhibit accessing rates approaching the processor clock rates. A generalized example of such a system is schematically depicted in FIG. 1, where processor 1 having onboard level 1 (L1) cache 2 is connected through level 2 (L2) cache 3 to system bus 4. As commonly implemented, system bus 4 also has attached thereto input/output communication resource 6 as well as main memory 7. Furthermore, in contemporary computer systems it is not at all unusual to have multiple processors, such as processor 8, also connected to system bus 4.
Given the contention for access to bus 4 in pursuit of instructions and data originally stored in main memory 7, processor 1 utilizes L2 cache 3 and L1 cache 2 to improve the match between the clock rate capability of processor 1 and the read and write access capabilities of the various memory devices storing instructions and data. Since cache architectures are well known by those routinely practicing in the contemporary computer processor technologies, it should be sufficient to note that lines of data A from main memory 7 are stored as cache lines in L2 cache 3, generally at 9, with their associated address tags, generally at 11. Some cache lines and related address tags are also stored in L1 level cache 2, that being a subset of what is in larger L2 cache 3, which is itself a subset of the information stored in main memory 7. Note that L1 cache 2 is shown within the boundaries of processor 1, the depiction representing a typical modern processor architecture in which the L1 cache is on a common integrated circuit chip with the processor. In contrast, the materially larger L2 cache resides on one or more separate integrated circuit chips.
FIG. 2 schematically depicts the elements of a typical L1 cache, where the cache is physically located on the same integrated circuit chip with a processor and is two-way set associative to optimize the match between the clock rate capability of the processor and the access capability of the cache memory. The architecture is commonly referred to as being a "late select", in that data is simultaneously extracted from both banks of the cache and stored in a set of registers for selective transfer to the processor late in the clock cycle. The select signal which defines the data set provided to the processor, from the data sets on bus 1 and bus 2, is derived from a search of the address tags in the two halves of the cache by the search engine. With the processor clock rates being so high, there is insufficient time to search the address tags, to identify which section of the cache holds the data for the specified address, and to access and transfer that data to the processor all within the clock cycle. In fact, the practice depicted in FIG. 2 transmits both data sets potentially defined by the requested address to a late select multiplexer just outside the processor, and in parallel time resolves the search of the address tags to decide, and late select, between the two.
Though the architecture depicted in FIG. 2 is suitable for most on-chip applications, it is not considered suitable for designs in which the processor and cache memory reside on separate chips in that two full 64-bit bus lines (assuming a 32-bit processor architecture) must extend across the printed circuit board for each section of the cache. The duplication of bus lines from the cache sections to a location immediately proximate or actually into the processor chip is not considered viable from a printed circuit board real estate, processor pin-out and cost perspective.
FIG. 3 schematically illustrates the conventional implementation of a 2-way set associative cache in which the cache memory chip or chips are connected through a common board level bus to the processor, the processor itself residing on a separate chip. Though the common bus avoids the bus line duplication noted earlier, it does not provide the capability for late selection in that the data must move over a single bus. Therefore, in the context of the architecture in FIG. 3, designers often have the search engine undergo a speculative selection between the cache sections, place that data on the board level data bus for access by the processor, and, late in the clock cycle, confirm the validity of the data. When the speculative architecture in FIG. 3 chooses the wrong cache section, a multiple clock cycle delay is needed to replace the data with that from the other cache section. Since the run lengths and capacitive loading of board level wiring lines are significantly greater than those associated with cache lines internal to an integrated circuit chip, it is not possible to switch within a clock cycle between cache sections when a speculative choice is found to be incorrect.
Therefore, what is needed is an architecture for an associative cache in which the cache memory and the processor reside on separate integrated circuit chips while sharing a common board level bus, yet can within a clock cycle provide the data from the correct cache section to the processor in the context of having to complete an address tag search of both cache sections. In a more generalized sense, there is a need for resources by which multilevel signals originating in different devices can be transmitted simultaneously over the same line and decoded at a common receiving end for appropriate binary level attribution to the respective transmitting devices. The desired multilevel transmission and conversion back to binary form should avoid the common problems attributable to multilevel voltage transmission as experienced in the prior art.