1. Technical Field
The present invention generally relates to system address transmission and in particular to address buses for transmitting system address to storage devices for data access operations. Still more particularly, the present invention relates to an index based system address bus transmitting data access addresses in a manner optimized for use by storage devices.
2. Description of the Related Art
High performance data processing systems typically include a number of levels of caching between the processor(s) and system memory to improve performance, reducing latency in data access operations. When utilized, multiple cache levels are typically employed in progressively larger sizes with a trade off to progressively longer access latencies. Smaller, faster caches are employed at levels within the storage hierarchy closer to the processor or processors, while larger, slower caches are employed at levels closer to system memory. Smaller amounts of data are maintained in upper cache levels, but may be accessed faster.
Within such systems, an indexed cache organization is commonly employed. Cache lines, the minimum unit of data which an associated coherency state describes, are stored within the cache in congruence classes. Congruence classes are sets of cache lines for which a portion of the corresponding system addressesxe2x80x94usually a group of higher order bitsxe2x80x94are identical. The portion of the system address which identifies the congruence class to which a cache line belongs is the index field. Another field within the system address, the tag, is utilized to distinguish members of a congruence class.
FIGS. 6A and 6B depict an addressing scheme and corresponding cache organization, respectively, which might be employed in accordance with the known art. In the example shown, bits 0 . . . 35 of a 56 bit cache line address are the tag, bits 36 . . . 46 are the index, and the remaining bits are an intra-cache line address. The index field of the address is employed by the cache directory 504 and the cache memory 506 to locate congruence classes. Cache directory 504 stores tags for cache lines contained within cache memory 506 within the congruence class, and compares the tag of a target address to the tags within the congruence class. If a match is identified, the corresponding cache line within cache memory 506 is the target data.
When a cache miss occurs in a higher level cache, the cache line address is transmitted on a bus such as a system bus to lower level caches and/or system memory to retrieve the corresponding cache line. Typically the entire address is transmitted in a single bus cycle if enough address lines are available in the bus. If not, the address is transmitted over multiple consecutive bus cycles with the lower order bits (i.e., the tag in the example shown) being transmitted first.
In contemporary data processing systems, caches and system memories generally operate at higher frequencies than the buses which couple the caches or system memories to other portions of the storage hierarchy. Usually these storage devices operate at internal frequencies which are some multiple of the bus operating frequency. The storage devices are also generally pipelined, allowing several accesses to be handled concurrently. Multiple accesses may be received by the storage device during a single bus cycle and staged to access the cache in a pipeline fashion.
Within the storage devices, the cache directory and memory lookup utilizing the index field is the first and greatest source of latency. Identifying and selecting the appropriate congruence class based on the index, and transmitting the tags and data from that congruence class, may take several internal clock cycles to complete. Once the tags from the directory are sent to the comparators, comparison with the address tag to determine a cache hit or miss should take only one clock cycle. The address tag is therefore not required until the end of the directory lookup.
In conventional designs, however, the address tag is transmitted on the system bus together with the associated index, or even before the associated index. This merely contributes to delay, and is not optimized for the manner in which the address is utilized. It would be desirable, therefore, to provide a system address bus for transmitting address for operations in a manner consistent with usage of the address in performing data access operations in a storage device.
It is therefore one object of the present invention to provide improved system address transmission.
It is another object of the present invention to provide an improved address buses for transmitting system address to storage devices for data access operations.
It is yet another object of the present invention to provide an index based system address bus transmitting data access addresses in a manner optimized for use by storage devices.
The foregoing objects are achieved as is now described. Following a cache miss by an operation, the address for the operation is transmitted on the bus coupling the cache to lower levels of the storage hierarchy. A portion of the address including the index field is transmitted during a first bus cycle, and may be employed to begin directory lookups in lower level storage devices before the address tag is received. The remainder of the address is transmitted during subsequent bus cycles, which should be in time for address tag comparisons with the congruence class elements. To allow multiple directory lookups to be occurring concurrently in a pipelined directory, a portion of multiple addresses for several data access operations, each portion including the index field for the respective address, may be transmitted during the first bus cycle or staged in consecutive bus cycles, with the remainders of each addressxe2x80x94including the cache tagsxe2x80x94transmitted during the subsequent bus cycles. This allows directory lookups utilizing the index fields to be processed concurrently within a lower level storage device for multiple operations, with the address tags being provided later, but still timely for tag comparisons at the end of the directory lookup. Where the lower level storage device operates at a higher frequency than the bus, overall latency is reduced and directory bandwidth is more efficiently utilized.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.