1. Field of the Invention
The present invention relates to cache memory utilization in a single or multi-processor computer system, and more particularly in determining the order of execution of multiple program process threads and the association of computer processors therewith.
2. Description of the Related Technology
Use of computers in business and at home is becoming more and more pervasive because the computer has become an integral tool of most information workers who work in the fields of accounting, law, engineering, insurance, services, sales and the like. Rapid technological improvements in the field of computers have opened up many new applications heretofore unavailable or too expensive for the use of older technology computers. A significant part of the ever increasing popularity of the computer, besides its low cost relative to just a few years ago, is its ability to run multiple application programs which appear to the user to be running concurrently. In a multi-processor computer system some of the application programs may run concurrently.
These application programs may be word processing, spreadsheet, database, graphics, computer aided design and engineering, and telecommunications to name a few. In the computer system there is a software operating system ("OS") that controls the functions of the computer processor(s) and peripheral components which make up the computer system. The OS program includes routines or an algorithm called a "scheduler" that decides which application program(s) is running on a processor(s), and what application program(s) will be running next.
The scheduler algorithm defines each part of the application program that is being executed on a processor as a "process" or "process thread." When a process thread is interrupted for what ever reason, the register values and unexecuted instructions of that process thread are saved and the scheduler tells the processor to restore the register values from a different process and start execution of another process thread. This is called a "Context Switch." In a multiprocessor computer system, multiple process threads may be executed on the multiple processors at the same time. However, only one process thread can run on one processor at any given time. Multiple processors can run multiple process threads up to the number of processors running concurrently in the computer system. The application programs can appear to the user to be running concurrently because the processor(s) switch(es) between the different threads very quickly, thus giving the impression that the application programs are running simultaneously.
The processor or plurality of processors in a computer system run in conjunction with a high capacity, low-speed (relative to the processor speed) main memory, and a low capacity, high-speed (comparable to the main memory speed) cache memory or memories (one or more cache memories associated with each of the plurality of processors).
Cache memory is used to reduce memory access time in mainframe computers, minicomputers, and microprocessors. The cache memory provides a relatively high speed memory interposed between the slower main memory and the processor to improve effective memory access rates, thus improving the overall performance and processing speed of the computer system by decreasing the apparent amount of time required to fetch information from main memory.
In today's single and multi-processor computer systems, there is typically at least one level of cache memory for each of the processors. The latest microprocessor integrated circuits may have a first level cache memory located in the integrated circuit package and closely coupled with the central processing unit ("CPU") of the microprocessor. Additional levels of cache may also be implemented by adding fast static random access memory (SRAM) integrated circuits and a cache controller. Typical secondary cache size may be any where from 64 kilobytes to 8 megabytes and the cache SRAM has an access time comparable with the processor clock speed.
In common usage, the term "cache" refers to a hiding place. The name "cache memory" is an appropriate term for this high speed memory that is interposed between the processor and main memory because cache memory is hidden from the user or programmer, and thus appears to be transparent. Cache memory, serving as a fast storage buffer between the processor and main memory, is not user addressable. The user is only aware of the apparently higher-speed memory accesses because the cache memory is satisfying many of the is requests instead of the slower main memory.
Cache memory is smaller than main memory because cache memory employs relatively expensive high speed memory devices, such as static random access memory ("SRAM"). Therefore, cache memory typically will not be large enough to hold all of the information needed during program execution. As a process executes, information in the cache memory must be replaced, or "overwritten" with new information from main memory that is necessary for executing the process thread. The information in main memory is typically updated each time a "dirty" cache line is evicted from the cache memory (a process called "write back"). As a result, changes made to information in cache memory will not be lost when new information enters cache memory and overwrites information which may have been changed by the processor.
Information is only temporarily stored in cache memory during execution of the process thread. When process thread data is referenced by a processor, the cache controller will determine if the required data is currently stored in the cache memory. If the required information is found in cache memory, this is referred to as a "cache hit." A cache hit allows the required information to be quickly retrieved from or modified in the high speed cache memory without having to access the much slower main memory, thus resulting in a significant savings in program execution time. When the required information is not found in the cache memory, this is referred to as a "cache miss." A cache miss indicates that the desired information must be retrieved from the relatively slow main memory and then placed into the cache memory. Cache memory updating and replacement schemes attempt to maximize the number of cache hits, and to minimize the number of cache misses.
Information from main memory for a process thread is typically stored in "lines" of cache memory which contain a plurality of bytes or words from the main memory such as, for example, 16, 32 or 64 bytes of information. The plurality of bytes from main memory are stored sequentially in a line of cache memory. The cache memory comprises a plurality of lines of information that may store information for a plurality of process threads. Each line of cache memory has an associated "tag" that stores the physical addresses of main memory containing the information in the cache line as well as other things such as "MESI" state information for the cache line. From the example above, if 16 bytes of information are stored in a cache line, the least significant 4 bits of the physical address of main memory are dropped from the main memory address stored in the tag register. In addition, the tag register may contain a cache consistency protocol such as "MESI" (Modified, Exclusive, Shared and Invalid) to ensure data consistency in a multi-processor or bus master environment.
A cache memory is said to be "direct mapped" if each byte of information can only be written to one place in the cache memory. The cache memory is said to be "fully associative" if a byte of information can be placed anywhere in the cache memory. The cache memory is said to be "set associative" if a group of blocks of information from main memory can only be placed in a restricted set of places in the cache memory, namely, in a specified "set" of the cache memory. Computer systems ordinarily utilize a variation of set associative mapping to keep track of the bytes of information that have been copied from main memory into cache memory.
The hierarchy of a set associative cache memory resembles a matrix. That is, a set associative cache memory is divided into different "sets" (such as the rows of a matrix) and different "ways" (such as the columns of a matrix). Thus, each line of a set associative cache memory is mapped or placed within a given set (row) and within a given way (column). The is number of columns, i.e., the number of lines in each set, determine the number of "ways" of the cache memory. Thus, a cache memory with four columns (four lines within each set) is deemed to be "4-way set associative."
Set associative cache memories include addresses for each line in the cache memory. Addresses may be divided into three different fields. First, a "block-offset field" is utilized to select the desired information from a line. Second, an "index field" specifies the set of cache memory where a line is mapped. Third, a "tag field" is used for purposes of comparison. When a request originates from a processor for new information, the index field selects a set of cache memory. The tag field of every line in the selected set is compared to the tag field sought by the processor. If the tag field of some line matches the tag field sought by the processor, a "cache hit" is detected and information from the block is obtained directly from or modified in the high speed cache memory. If no match occurs, a "cache miss" occurs and the cache memory is typically updated. Cache memory is updated by retrieving the desired information from main memory and then mapping this information into a line of the set associative cache. When the "cache miss" occurs, a line is first mapped with respect to a set (row), and then mapped with respect to a way (column). That is, the index field of a line of information retrieved from main memory specifies the set of cache memory wherein this line will be mapped. A "replacement scheme" is then relied upon to choose the particular line of the set that will be replaced. In other words, a replacement scheme determines the way (column) where the line will be located. The object of a replacement scheme is to select for replacement the line of the set that is least likely to be needed in the near future so as to minimize further cache misses.
Several factors contribute to the optimal utilization of cache memory in computer systems: cache memory hit ratio (probability of finding a requested item in cache), cache memory access time, delay incurred due to a cache memory miss, and time required to synchronize main memory with cache memory (write back or write through). In order to minimize delays incurred when a cache miss is encountered, as well as improve cache memory hit rates, an appropriate cache memory replacement scheme is used.
Set associative cache memory replacement schemes may be divided into two basic categories: non-usage based and usage based. Non-usage based replacement schemes, which include first in, first out ("FIFO") and "random" replacement schemes, make replacement selections on some basis other than memory usage. The FIFO replacement scheme replaces the line of a given set of cache memory which has been contained in the given set for the longest period of time. The random replacement scheme randomly replaces a line of a given set.
Usage based schemes, which include the least recently used ("LRU") replacement scheme, take into account the history of memory usage. In the LRU replacement scheme the least recently used line of information in cache memory is overwritten by the newest entry into cache memory. An LRU replacement scheme assumes that the least recently used line of a given set is the line that is least likely to be reused again in the immediate future. An LRU replacement scheme thus replaces the least recently used line of a given set with a new line of information that must be copied from main memory.
Regardless of the replacement scheme used, the scheduler algorithm will decide what process thread will be executed next in a single processor computer. In a multi-processor computer system, the scheduler algorithm decides what process threads are to run concurrently, and which processor will execute each of these process threads. The scheduler then determines the next appropriate process thread to be executed, etc. The scheduler may cause the process threads to be executed in order of occurrence, or the order of execution may be determined by some software or hardware priority paradigm. The scheduler cannot determine, however, what the likely cache hit or miss outcome will be during execution of any given process thread. Some operating systems use a concept called "Strong Affinity" when scheduling threads. "Strong Affinity" schedulers attempt to execute a thread on the same processor it last ran on. The reason for doing this is because the same processor's cache is more likely to contain data that is relevant to the process than some other processor in the system.
What is needed is a method and apparatus to improve the likelihood of cache hits during execution of a process thread. It is desired to improve the computer system efficiency by having the scheduler algorithm make an informed decision on which program thread would be most appropriate to run next and on what processor. In addition, it is desired to improve usage of cache memory by selecting locations to be written to that contain no longer needed process thread information.