The invention relates generally to computer systems, and deals more particularly with a hierarchical cache system.
Previously known computer systems include a CPU, a main memory and a cache system interposed between the CPU and the main memory to expedite access to main memory data. A typical cache system comprises a data cache to store data fetched from or written to main memory, and a directory to store main memory addresses of the data copied into the data cache. The processor can access the data cache faster than the main memory because the data cache is smaller than the main memory, physically located closer to the processor than the main memory, and usually formed from faster but more expensive technology. Consequently, it is desirable to store in the cache the data that is currently needed by the CPU and likely to be needed next. An effective caching strategy relies on spatial and temporal locality of reference, i.e. the data likely to be needed next by the processor is stored in the main memory near the data currently requested. This is true, for example, when the processor requests to sequentially read lines of a file, and the lines of the file are stored in successive locations in main memory. Therefore, when the processor requests data, typically four or eight bytes per request, this data along with the remainder of a cache block (typically one line comprising 128 bytes of contiguous addresses) are fetched from main memory and loaded into the data cache. The time cost of fetching the entire block from the relatively remote main memory is recovered when the processor accesses the remainder of a block (or line) from the cache.
It was also previously known to connect a set of I/O processors to the cache system such that data accessible from an external storage device via the I/O processors can be accessed by the CPUs from the cache system.
A hierarchical two level cache system was also known and includes a plurality of level one (L1) data caches and respective directories. Each pair of L1 cache and directory serves one processor. A level two (L2) data cache and associated directory are coupled to and serve all the L1 caches and associated directories. The L2 data cache is also coupled to the main memory (or extended memory), and stores a copy of all data requested by any of the processors. If another CPU requests the same data, then it is available from the L2 cache and need not be fetched from main memory (which is more time consuming). When any processor modifies data, the modified data is written to the L2 cache, and control hardware associated with the L2 data cache notifies all other L1 caches that contain a copy of the data that their copy of the data is now invalid. Thus, the L2 cache serves as a central station for transferring data between the main memory and all the L1 caches.
It was previously known to operate such a hierarchical cache system in either a "store through" mode or a "store in" mode. In the "store through" mode, the CPU requests to store data into the L2 cache only. If these memory locations are currently represented in the L2 cache, the data is stored in the L2 cache in these memory locations without accessing the main memory. If these memory locations are also represented in an L1 cache, these memory locations are invalidated in the L1 caches but the data is not written into them. If these memory locations are not currently represented in the L2 cache, then the contents of these memory locations and associated memory page are copied from main memory into the L2 cache and then overwritten with the CPU data. In this last case where these memory locations were not represented in the L2 cache, they were not present in any L1 cache either and therefore, no action is taken in any L1 cache.
In the "store in" mode, the CPU requests to store data in both the associated L1 cache and the L2 cache. If these memory locations are currently represented in the L2 cache, the data is stored in the L2 cache in these memory locations without accessing the main memory. Also, these memory locations represented in the associated L1 cache are updated with the new data. If these memory locations are represented in any other L1 caches, these memory locations in the other L1 caches are invalidated. If these memory locations are not currently represented in the L2 cache, then the contents of these memory locations and associated memory page are copied from main memory into the L2 cache and then overwritten with the CPU data. Then, the updated contents of these memory locations are written into the L1 cache of the requesting CPU.
In the foregoing hierarchical cache system of the prior art, if a CPU wanted to update data in the L3 cache or main memory, it was first necessary to copy the old data into the L2 cache as in the "store through" or "store in" mode and then update the data in the L2 cache and when requested, request a "cast out" of the updated data back to main memory. This had the following drawback in cases where the CPU did not want to immediately read the data or further update the data. It was time consuming and burdensome for the CPU to fetch the old data from main memory into the L2 cache, write the updated data into the L2 cache, and then cast out the updated data. Even if the data resided in the L2 cache before the main memory update request was made, it is time consuming to actually write the updates into the L2 cache and then cast out the updated data to main memory. Also, when the data is written into the L2 cache, some other data may be cast out to make room for the new data, and the CPU may need the data that was just cast out.
A general object of the present invention is to provide a hierarchical cache system in which a CPU can write data into main memory without also writing the data into the L2 cache or requiring that the old data reside in the L2 cache.
Another general object of the present invention is to provide a hierarchical cache system in which a CPU can copy data from one location in main memory or extended memory to another location in main memory or extended memory without also writing the data into the L2 cache.
Another general object of the present invention is to provide a hierarchical cache system of the foregoing types which requires minimal control by the CPU.
Another object of the present invention is to provide an improved buffering system for implementing the foregoing data movement.