1. Field of the Invention
This invention generally relates to multiprocessor computer systems that employ cache memory subsystems and, more particularly, to a cache memory subsystem that allows concurrent accesses of cache line tags stored within a cache memory.
2. Description of the Relevant Art
A cache memory is a high-speed memory unit interposed in the memory hierarchy of a computer system between a slower system memory and a processor. A cache typically stores recently used data to improve effective memory transfer rates to thereby improve system performance. The cache is usually implemented by semiconductor memory devices having speeds that are comparable to the speed of the processor, while the system memory utilizes a less costly, lower speed technology.
A cache memory typically includes a plurality of memory locations that each stores a block or a xe2x80x9clinexe2x80x9d of two or more words. Each line in the cache has associated with it an address tag that is used to uniquely identify the address of the line. The address tags are typically included within a tag array memory device. Additional bits may further be stored for each line along with the address tag to identify the coherency state of the line.
A processor may read from or write directly into one or more lines in the cache if the lines are present in the cache and if the coherency state allows the access. For example, when a read request originates in the processor for a new word, whether data or instruction, an address tag comparison is made to determine whether a valid copy of the requested word resides in a line of the cache memory. If the line is present, a cache xe2x80x9chitxe2x80x9d has occurred and the data is used directly from the cache. If the line is not present, a cache xe2x80x9cmissxe2x80x9d has occurred and a line containing the requested word is retrieved from the system memory and may be stored in the cache memory. The requested line is simultaneously supplied to the processor to satisfy the request.
Similarly, when the processor generates a write request, an address tag comparison is made to determine whether the line into which data is to be written resides in the cache. If the line is present, the data may be written directly into the cache (assuming the coherency state for the line allows for such modification). If the line does not exist in the cache, a line corresponding to the address being written may be allocated within the cache, and the data may be written into the allocated line.
Because two or more copies of a particular piece of data can exist in more than one storage location within a cache-based computer system, coherency among the data is necessary. Various coherency protocols and specialized bus transfer mechanisms may be employed for this purpose depending on the complexity of the system as well as its requirements. For example, coherence between the cache and the system memory during processor writes may be maintained by employing either a xe2x80x9cwrite-throughxe2x80x9d or a xe2x80x9cwrite-backxe2x80x9d technique. The former technique guarantees consistency between the cache and the system memory by writing the same data to both locations. The latter technique handles coherency by writing only to the cache, and by marking the entry in the cache as being modified. When a modified cache entry is later removed during a cache replacement cycle (or is required by a device other than the processor), the modified data is typically written back to the system memory (and/or provided to the requesting device).
In a multiprocessor shared-memory computer system, separate caches associated with each of the processors may simultaneously store data corresponding to the same memory location. Thus, memory coherency within such systems must typically be handled using somewhat more elaborate and complex schemes. For example, coherency in multiprocessor shared-memory systems may be maintained through employment of either a directory-based protocol or a snooping protocol. In a directory-based protocol, a directory is maintained that indicates which processors have copies of each cache line. This directory is used to limit the processors that must monitor, and possibly respond to, a given request for a cache line. The use of directories reduces snoop traffic and thus fallows larger systems to be built. However, the use of directories typically increases the system""s latency (which is caused by the directory lookup), as well as the system""s hardware complexity and cost.
In a snooping protocol, each processor broadcasts all of its requests for cache lines to all other processors. In many systems, this may be done through a common shared bus. The cache associated with each processor stores along with its address tags coherency information indicating the state of each of its stored lines. Each processor snoops the requests from other processors and responds accordingly by updating its cache tags and/or by providing the data. Thus, each request from another processor may require that a given processor access its own cache""s tags to determine if the line exists within the cache, and to update the tag and/or provide the data if necessary. In systems that store cache tags off-chip, the rate at which these cache tags can be accessed can put a limit on the rate at which snoops can be processed. Unfortunately, this snoop bandwidth limit in turn limits the number of processors that can be supported in a system.
One solution to this problem is to store the cache tags on-chip (on the same chip as the processor), even for cache lines that are stored off-chip. However, this solution suffers from several serious drawbacks, including the large amount of processor area that must be devoted to maintain these cache tags, the lack of flexibility in changing off-chip cache sizes and organizations, and an increased latency when the data is present in the off-chip cache. Therefore, a cache memory subsystem is desirable that may allow significantly increased snoop bandwidth without requiring the use of directories or on-chip cache tags.
The problems outlined above may in large part be solved by a cache memory subsystem that enables the concurrent accessing of multiple cache tags in response to a plurality of snoop requests. In one embodiment, the cache memory subsystem includes a cache controller coupled to a cache memory. The cache memory includes a plurality of memory chips, or other separately addressable memory sections, which are configured to collectively store a plurality of cache lines. Each cache line includes data and an associated cache tag. The cache tag may include an address tag which identifies the line as well as state information indicating the coherency state for the line. Each cache line is stored across the memory chips in a row formed by corresponding entries (i.e., entries accessed using the same index address). The plurality of cache lines is grouped into separate subsets based on index addresses, thereby forming several separate classes of cache lines. The cache tags associated with cache lines of different classes are stored in different memory chips. During operation, the cache controller may receive multiple snoop requests corresponding to, for example, transactions initiated by various processors residing on a shared bus. The cache controller is configured to concurrently access the cache tags of multiple lines in response to the snoop requests if the lines correspond to differing classes. In this manner, multiple snoop requests may be serviced simultaneously to thereby significantly increase snoop bandwidth.
In one particular embodiment, in response to receiving a plurality of snoop requests corresponding to various transactions occurring on a system bus, the cache controller determines the class to which each request belongs. The class to which a particular request belongs may be based upon, for example, certain bits of the address associated with the request. For example, in one embodiment, the class is determined by certain upper order bits of an index portion of the address of a snoop request. The cache controller subsequently drives the index addresses for requests of different classes simultaneously to the address lines of respective memory chips to thereby perform a number of cache tag read operations simultaneously. If none of the reads require accessing the data or changing the cache tags, the snooping for those requests is complete. If one or more of the snoop requests require that the corresponding cache tags be updated, such updates may be performed in parallel for cache lines of different classes. Finally, if any snoop requests require that corresponding data be read from a particular cached line, a separate access may be performed to read the data. Such a data read operation may be performed simultaneously with a tag access. Because most snoops do not require changing the cache tag and/or reading the cache line data, a substantial increase in snoop bandwidth may be advantageously attained.