FIG. 1 is a block diagram that illustrates a conventional multiprocessor computer system in which the present invention may be applied. In FIG. 1 reference numeral 10 generally indicates the multiprocessor computer system.
The multiprocessor computer system 10 includes a plurality of nodes, of which only two nodes (node 1—reference numeral 12—and node m—reference numeral 14) are shown in the drawing. The nodes of the multiprocessor computer system 10, including nodes 12 and 14, are connected to each other and to a main memory 16 via an interconnecting bus/network 18.
Because all of the nodes of the multiprocessor computer system 10 may be of the same construction, only the node 12 (node 1) is shown in detail.
The node 12 includes a processor 20 (processor p1). (It will be appreciated that each other node of the multiprocessor computer system 10 includes a respective processor.) Associated with the processor 20 is an upper level cache memory 22 (cache L1). A cache controller 24 is associated with and controls operation of the cache memory 22. The node 12 also includes lower level cache memories 26, 28. A respective cache controller 24 is associated with, and controls operation of, each of the cache memories 26, 28. The lower level cache memories 26, 28 may be dedicated for use by the processor 20 (processor p1). Alternatively, some or all of the lower level cache memories 26, 28 may be shared with another processor 30 (processor p2) which may, with its associated upper level cache 32, be part of the same node 12 (node 1).
As is well known to those who are skilled in the art, cache memories are provided to store data that is likely to be needed by the processor in the near future. Cache memories feature shorter access times than the main memory 16 of the multiprocessor computer system 10. For example, the upper level cache memory 22 may be provided on-chip with the processor 20, thereby providing fastest access times. The lower level cache memories 26, 28 provide slower access times than the upper level cache memory 22, but are more closely associated with the processor 20, and thereby provide faster access times, than the main memory 16.
FIG. 2 is a block diagram that illustrates a typical one of the cache memories 22, 26, 28, 30. The cache memory of FIG. 2, which is denoted by reference numeral 34, includes a data storage facility 36 and a directory 38. As indicated at 40, the data storage facility 36 stores blocks of data 42. As illustrated at 44, the directory 38 stores for each data block 42 in the data storage facility 36 a block address 46 and a “state” 48. The block addresses 46 are correlated with block addresses in the main memory 16. The states 48 are maintained to ensure data integrity using a cache coherence protocol in a multiprocessor environment.
As is familiar to those who are skilled in the art, a cache coherency protocol assures that consistent views of the data are maintained by the various processors of the multiprocessor computer system 10. According to one cache coherency protocol, known as the “MESI” protocol, the possible states for the memory blocks stored in a cache memory 34 are “modified”, “exclusive”, “shared”, and “invalid”.
A block of memory is assigned the “modified” state when it is the only valid copy among all the cache memories, and is not consistent with the copy in main memory. The “exclusive” state is assigned when the memory block is the only copy in a cache memory, and is consistent with the copy in main memory. The “shared” state is assigned when multiple copies of the memory block are present among the cache memories, and are consistent with the copy in main memory. The “invalid” state signifies that the memory block must be fetched from main memory or from another cache.
The “shared” state may be sub-divided into the following sub-states: “shared read-only”, which means that the cache copies were never modified, and “shared read-write”, which means that the memory block was modified at least once prior to being shared.
It is known to improve performance of multiprocessor computer systems by pre-fetching data from a lower level cache memory for storage in an upper level cache memory. By pre-fetching is meant requesting data in advance so that it will be present in the upper level cache when needed by the associated processor.
A problem that may be encountered in pre-fetching data is that the data may be fetched too early. That is, the pre-fetched memory block may be invalidated before it is accessed by the processor. Such premature pre-fetches may unnecessarily utilize resources, thereby decreasing the bandwidth available for cache memory operations.
It would be desirable to improve the efficiency of operation of multiprocessor computer systems in which pre-fetching is employed.