1. Technical Field
This invention generally relates to computer system memory and more specifically relates to cache memory management.
2. Background Art
Today, our society is heavily dependent upon computers for everyday activity. Computers are found in homes, in business offices, and in most production and manufacturing environments. Most computer systems are controlled by a central processing unit (CPU) and have various levels of memory which can be used by the CPU to perform the various functions for which it has been programmed. Typically, computer programs are loaded into the computer system's memory storage areas and executed by the CPU. The programs and data are stored in different areas of the computer system's memory depending on what type of function the CPU is performing. Traditionally, the computer system's memory has been classified as either main memory (primary or main storage) or secondary memory (secondary storage).
Programs and data need to be in main memory in order to be executed or referenced by a running program. Programs or data not needed immediately may be kept in secondary memory until needed and then brought into main storage for execution or reference. The internal storage locations of a given memory location are often referred to as "lines of memory." Secondary memory media such as tape or disk are generally less costly than the main memory and have much greater capacity. Main memory may generally be accessed much faster than secondary memory.
In the 1960s it became clear that the traditional memory storage hierarchy could be extended by one more level with dramatic improvements in performance and utilization. This additional level, the "cache," is a high-speed memory that is much faster than the main memory. Cache storage is relatively expensive when compared with main memory and therefore, in a typical computer system, only relatively small amounts cache memory are used. In addition, limiting the size of cache storage enhances the speed of the cache.
Cache memory imposes one more level of memory management overhead on the computer system. Programs or data in the main memory are shuttled or "swapped" into the high-speed cache before being executed or referenced. The programs or data that were previously residing in the cache must be "swapped" out, usually on a "least-recently-used" basis. This means that if there is no room in the cache and room is needed for additional instructions, then the information that has not been accessed for the longest period of time will be swapped out of the cache and replaced with the new information. In this manner, the most recently used information has the greatest likelihood of being available in the cache at any given time.
Cache memory generally operates faster than main memory, typically by a factor of five to ten times and may, under certain circumstances, approach the operational speed of the CPU itself. By keeping the most frequently accessed instructions and/or data in high speed cache memory, average overall memory access time for the system will approach the access time of the cache. There is a certain amount of overhead involved in shuttling information between various memory locations. This overhead is kept as small as possible so that it does not cancel out the performance increase achieved by utilizing cache storage. In addition, if the specific program instruction to be executed has been pre-loaded into the cache, the CPU may execute the program instruction without returning to either main memory or secondary memory, thereby significantly increasing the operational speed of the system. Whenever the CPU requests a specific instruction or item of data, the CPU generates a request which includes a tag as part of the address or location in memory where the instruction or data may be found. If the tag for the information requested by the CPU matches a tag for a line of memory currently residing in the cache, then the CPU can access the data or instruction from the cache. If the tag doesn't match any of the tags for the lines of memory in the cache, then the information must be fetched and loaded into the cache.
Cache memory may be subdivided into different categories based on what part of the computer system it is located on or associated with. "On-chip" cache memory is co-located on the microprocessor chip with the CPU and is usually referred to as Level 1 cache memory. Additional cache memory that is not located on the same chip with the microprocessor is usually referred to as Level 2 or Level 3 cache memory.
Even with a cache memory management scheme, there are additional, related problems that can cause system performance to suffer. For example, in data processing systems with several levels of memory storage, a great deal of shuttling goes on in which programs and data are moved back and forth between the various memory levels. This shuttling consumes system resources such as CPU time and bus bandwidth that could otherwise be put to more productive processing use. This problem has been exacerbated in recent years by the growing disparity between the processing speed of the CPU and the operational speeds of the different computer system components used to transfer information and instructions to the CPU. In the past few years, the processing speed of CPUs in general has increased tremendously while the operational speeds of related system components have not progressed as quickly.
For example, a few short years ago, CPU processing speeds in the range of 16 MHz-33 MHz were fairly common. Presently, however, CPUs operate at processing speeds in excess of 180 MHz with some CPUs exceeding even 200 MHz. In contrast, the processing speed of other data processing system components, particularly those components used to deliver data to the CPU for processing, have not kept pace. This has resulted in a well-known performance problem for computer systems. Specifically, even with a cache memory management system in place, it can take so long to deliver information to the cache that the CPU may spend a relatively long period of time waiting for required information to be loaded into the cache. Whenever the CPU needs to process data or instructions that are unavailable in the cache, the CPU "stalls" until the necessary information is loaded from the external memory bus into the cache. The CPU, in effect, is "starved" for data and wastes valuable processing time waiting for the necessary data or instructions to become available.
Cache memory is often used in high speed data processing system architectures which also often include multiple interrupt levels. An interrupt is a signal sent to the CPU which alerts the CPU that another task needs to be serviced. As is well known to those skilled in the art, an interrupt may be an "external" interrupt, for example from a keyboard, disk drive, or other peripheral unit, or may be an "internal" interrupt from an internally generated timer. Upon occurrence of an interrupt, the currently executing task is interrupted and a first (interrupting) task is performed. The interrupted task may be resumed after completion of the interrupting task.
Frequent task interruptions typically degrade the performance of a cache memory. When the currently executing task is interrupted, the cache has been loaded with the data and instructions necessary to process the specific task that is executing at the time the interrupt occurs. The interrupting task is typically unrelated to the previously executing task and therefore, the data and instructions in the cache are not the data and instructions required to perform the interrupting task. This means that the cache must be emptied out and loaded with the new data and instructions necessary to process the interrupting task. Once again, frequent trips to main memory may be required before the cache will be loaded with the data and instructions necessary to process the interrupting task. Accordingly, the performance of the CPU decreases dramatically and, correspondingly, overall system performance will be degraded.
Similarly, once the interrupting task has run to completion and the interrupted task resumes processing or some other scheduled task begins processing, the data and instructions loaded in the cache for the interrupting task are typically unrelated to the data and instructions necessary for processing the next task and the cache will have to be loaded with a different set of data and instructions once again. Obviously, the more frequently these interrupts occur, the more frequently the cache must be reloaded with data and instructions from main memory and, once again, overall system performance will suffer. Therefore, in order to improve system performance, the cache must ideally contain the data and instructions necessary for the CPU to complete any task that is currently being processed as quickly as possible.
As explained above, the loading of data and instructions from main memory into a cache over the external memory bus can become a significant bottleneck, especially if the CPU frequently switches tasks and needs additional information loaded into the cache. Significant performance problems can occur because of this bottleneck. Once the cache is loaded with the necessary data, overall system operation can approach the maximum operational speed of the CPU and the cache. With the recent increase in data processing system using multi-tasking operating systems, frequent interrupts are becoming even more of a problem for system performance. However, without a way to efficiently load and optimize the contents of the cache for more effective processing by the CPU, the overall performance of data processing systems will continue to suffer.
Therefore, there exists a need to provide an apparatus and method to more effectively utilize cache memory and thereby improve the performance of a CPU-based data processing system. This apparatus and method should allow the cache to be loaded with the data and instructions most likely to be requested by the CPU at the times the CPU is most likely to request the given data and instructions. This apparatus and system should increase overall system performance by decreasing CPU stalls and by providing data and instructions to the cache in a more efficient manner than existing systems.