1. Field of the Invention
This invention generally relates to the management of caches in a computer system. Namely, this invention relates to a method and an apparatus for reducing memory traffic between various components of a memory subsystem (e.g. between a write buffer and main memory, between a cache and main memory, and between processors in a multiprocessor system). More specifically, the present invention relates to a mechanism for avoiding unnecessary cache traffic related to handling cached data corresponding to pages marked as containing useless data (e.g. invalidated pages).
2. Discussion of the Prior Art
Frequently, a program generates and modifies data which at some point in time (even long before program completion) become absolutely xe2x80x9cuselessxe2x80x9d for that program. In a memory constraint environment, it is not unlikely for a page containing such useless (modified) data to be swapped out from main memory to a disk. There exist several techniques (e.g. DIAGNOSE Xxe2x80x98214xe2x80x99 Pending Page Release call available on IBM s/390 and a mechanism described in copending application Ser. No. 09/636,049, filed Aug. 10, 2000, for xe2x80x9cA Method and Apparatus for Efficient Virtual Memory Managementxe2x80x9d), the disclosure of which is herein, incorporated by reference) enabling a program to indicate to the OS whether the contents of certain pages (e.g. those containing only useless data) can be safely discarded (i.e. invalidated) should those pages be selected for eviction. The content of a page marked as invalid (useless) need not be written back to a disk even if the page is in a modified state. (The terms xe2x80x9cmodifiedxe2x80x9d and xe2x80x9cdirtyxe2x80x9d will be used interchangeably.)
However, when a page is invalidated, it is possible that some data (dirty or clean) associated with that page still resides in a processor cache (e.g. in a large write-back cache). This data is useless and occupies valuable cache resources. Moreover, handling this useless data (especially the data in a modified/dirty state) causes needless overhead:
a) Specifically, sooner or later, modified (dirty), useless data will be evicted from a cache (as a result of a cache line replacement while servicing a cache miss) and placed into a write buffer[2]. From a write buffer, the data is written back to memory. Writing this data to memory generates unnecessary traffic on a memory bus. Because write buffers tend to be small, they can easily get filled with useless data.
b) In addition, in a set associative cache, a cache line containing useful data (i.e. the data that will be used in the future) can be replaced before a cache line (from the same set) containing useless (dirty or clean) data (e.g. according to the least recently used replacement policy). If the useful data is in a modified state, it will have to be written back to memory. Later on, as a result of a cache miss, the useful data will have to be re-fetched from memory. Re-fetching that useful data generates unnecessary traffic on a memory bus. Hence, in a set-associative cache, a cache line containing useful data can sometimes be evicted before a cache line (in the same set) containing useless data.
c) Also, in a multiprocessor system (e.g. a shared memory multiprocessor, if some processor other than the one currently controlling the invalidated page (i.e. a page containing useless data) gets the control of the invalidated page and starts initializing that page with new data., some useless data (dirty or clean) may be sent across a bus (or some other interconnect), by a coherence controller to that requesting processor (i.e. the processor performing the initialization of that page and hereby requesting cached data from that page be transmitted to it). Hence, in a multiprocessor system (e.g. a shared memory multiprocessor), a coherence controller sometimes transmits useless data to a requesting processor.
The instruction sets of many microprocessors include instructions for managing on-chip (or local) instruction and data caches. Some of these instructions (privileged instructions) are only available to the supervisor-mode processes due to their destructive effect while others can be used-by user-level processes for effective cache management. For example, the following cache management instruction is implemented in PowerPC 604e:
DCBI (Data Cache Block Invalidate) is a supervisor-level (privileged) instruction due to the fact that it discards and invalidates the data stored in the cache block whether it is in clean (exclusive or shared) or in modified states. Once the data is invalidated, other processors on the same bus are instructed to invalidate the specified cache block.
Note that it is the responsibility of a programmer to invalidate cache lines containing useless data. A single instruction can invalidate at most one line. Hence, a number of such instructions must be issued to invalidate multiple cache lines (associated with a page). Finally, a supervisor-level (priviledged) instruction cannot be used by a user-level program.
From a correctness standpoint, cache lines containing useless data (i.e. cache lines corresponding to pages that have been invalidated) do not have to be kept around, written back to memory, or transmitted to a (remote) requesting processor upon request.
Therefore, an object of this invention is a method and an apparatus for reducing memory traffic between various components of a memory subsystem (e.g. between a write buffer and main memory, between a cache and main memory, and between processors in a multiprocessor system).
The invention reduces the traffic associated with handling cache lines containing useless data (i.e. cache lines corresponding to pages that have been invalidated). The invention makes it possible to do the following: to evict such cache lines earlier than when they would have been evicted according to an implemented replacement policy; to avoid write backs of useless data from such cache lines to main memory; and to avoid transmitting useless data from such cache lines to a requesting processor in a multiprocessor system.
The present invention describes a mechanism for invalidating cache lines containing useless data transparently and without programmer""s involvement. For efficiency, the content of a cache line containing useless data is invalidated only when the line is referenced or is.about to be allocated for some new data.
One aspect of the present invention is an invalid page ID buffer. (IPIDB) containing the IDs of pages that have been invalidated (i.e. pages containing useless data).
Another aspect of this invention is a memory traffic saver device coupled with a write buffer. This device monitors a write buffer and uses the information from the invalid page ID buffer (IPIDB) to remove those entries from a write buffer that are associated with pages that have been invalidated (i.e. pages containing useless data) and not writing the data in those entries to main memory.
Another aspect of the present invention is a memory traffic saver device coupled with a cache controller. This device examines a set from which a cache line is about to be evicted and uses the information from the invalid page ID buffer (IPIDB) to evict those cache lines first that are associated with pages that have been invalidated (i.e. pages containing useless data). The content of evicted cache lines with useless data is discarded and not written to memory.
Another aspect of the present invention is a memory traffic saver device coupled with a coherence controller. This device examines requests received by a local coherence controller from a remote coherence controller for some data to be transmitted and uses the information from the invalid page ID buffer (IPIDB) to avoid transmitting to the requesting party the data from those cache lines that are associated with pages that have been invalidated (i.e. pages containing useless data) and, at the same time, invalidates those cache lines in a local cache.