1. Technical Field
The present invention relates in general to data processing and, in particular, to cache management in a data processing system. Still more particularly, the present invention relates to a data processing system, cache, and method of cache management having an O state for memory-consistent. cache lines of unknown coherency.
2. Description of the Related Art
A conventional multiprocessor data processing system may comprise a system bus to which a system memory and a number of processing units that each include a processor and one or more levels of cache memory are coupled. To obtain valid execution results in such a multiprocessor data processing system, a single view of the contents of memory must be provided to all of the processors by maintaining a coherent memory hierarchy.
A coherent memory hierarchy is maintained through the implementation of a selected coherency protocol, such as the conventional MESI protocol. According to the MESI protocol, an indication of a coherency state is stored in association with each coherency granule (e.g., cache line or sector) of at least all upper level (i.e., cache) memories. Each coherency granule can have one of four states, modified (M), exclusive (E), shared (S), or invalid (I), which is typically indicated by two bits in the cache directory. The modified state indicates that a coherency granule is valid only in the cache storing the modified coherency granule and that the value of the modified coherency granule has not been written to (i.e., is inconsistent with) system memory. When a coherency granule is indicated as exclusive, the coherency granule is resident in, of all caches at that level of the memory hierarchy, only the cache having the coherency granule in the exclusive state. The data in the exclusive state is consistent with system memory, however. If a coherency granule is marked as shared in a cache directory, the coherency granule is resident in the associated cache and in at least one other cache at the same level of the memory hierarchy, all of the copies of the coherency granule being consistent with system memory. Finally, the invalid state generally indicates that the data and address tag associated with a coherency granule are both invalid.
The state to which each coherency granule is set can be dependent upon a previous state of the cache line, the type of memory access sought by processors to the associated memory address, and the state of the coherency granule in other caches. Accordingly, maintaining cache coherency in the multiprocessor data processing system requires that processors communicate messages across the system bus indicating an intention to read or write memory locations. For example, when a processing unit requires data not resident in its cache(s), the processing unit issues a read request on the system bus specifying a particular memory address. The read request is interpreted by its recipients as a request for only a single coherency granule in the lowest level cache in the processing unit. The requested cache is then provided to the requester by a recipient determined by the coherency protocol, and the requestor typically caches the data in one of the valid states (i.e., M, E, or S) because of the probability that the cache line will again be accessed shortly.
The present invention recognizes that the conventional read request/response scenario for a multiprocessor data processing system outlined above is subject to a number of inefficiencies. First, given the large communication latency associated with accesses to lower levels of the memory hierarchy (particularly to system memory) in state of the art systems and the statistical likelihood that data adjacent to a requested cache line in lower level cache or system memory will subsequently be requested, it is inefficient to supply only the requested coherency granule in response to a request.
Second, a significant component of the overall access latency to system memory is the internal memory latency attributable to decoding the request address and activating the appropriate word and bit lines to read out the requested cache line. In addition, it is typically the case that the requested coherency granule is only a subset of a larger data set that must be accessed at a lower level cache or system memory in order to source the requested coherency granule. Thus, when system memory receives multiple sequential requests for adjacent cache lines, the internal memory latency is unnecessarily multiplied, since multiple adjacent cache lines of data could be sourced in response to a single request at approximately the same internal memory latency as a single cache line.
In view of the above and other shortcomings in the art recognized by the present invention, the present invention introduces the O cache consistency state, which permits unrequested memory-consistent and possibly non-coherent data to be stored in a cache, thereby reducing a processor""s access latency to memory-consistent data.
A multiprocessor data processing system that can implement the present invention includes an interconnect, a plurality of processing units coupled to the interconnect, and at least one system memory and a plurality of caches coupled to the plurality of processing units. A cache suitable for use in such a multiprocessor data processing system includes data storage containing multiple granules of data and a number of state fields associated with the granules of data. Each state field has a plurality of possible states including an O state indicating that an associated granule is consistent with corresponding data in the memory and has unknown coherency with respect to peer caches in the data processing system. Thus, a cache is permitted to store memory-consistent, but possibly non-coherent data in order to offer processing units in the data processing system lower latency to an image of system memory.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.