A multiple-processor computer system is a system that has more than one processor. Each processor may have associated memory that nevertheless is accessible by all the processors within the system. The memory associated with a given processor is generally the closest memory to the given processor, such that the memory of the other processors is farthest away from the given processor, either physically and/or temporally. The memory of a given processor within a multiple-processor system can be considered memory that is local to the given processor. The memory of the other processors within the system is thus considered memory that is remote to the given processor.
Computer programs that are executed or run on a computer system each usually have one or more tasks or processes that make up the program. In a multiple-processor computer system, a special sub-system of the operating system, known as a scheduler, assigns tasks to run on particular processors of the computer system. Therefore, each task or process is assigned a particular processor on which to run. Generally, the scheduler assigns tasks to different processors depending on their availability. A processor that has few tasks running on it as compared to the other processors of the system is thus a candidate for being assigned additional tasks.
A task or process is usually allocated both virtual memory as well as physical memory for it to be run properly. The virtual memory allocated for the task or process is a portion of the total addressable memory of the system, where the total addressable memory of the system may be greater than the physical memory of a system. For example, a given computer system may be able to address four gigabytes of memory, but only have three gigabytes of physical memory. The gigabyte of addressable memory that does not correspond to physical memory can be written onto a storage device, such as a hard disk drive. When some of this gigabyte of addressable memory is needed by a processor, the contents of some of the physical memory are written out onto the storage device, and the contents of some of the addressable memory previously stored on the storage device are read into physical memory. In this way, a computer system can have more total memory that can be used by tasks than it has actual physical memory.
The amount of actual physical memory in use by a given task or process is referred to as the resident set size of the task. While the amount of total memory assigned to the task may be greater than the resident set size, the resident set size specifies how much physical memory the task is actually using at a particular point in time. The physical memory assigned to the task, in accordance with its resident set size, may be located anywhere within a computer system, depending on where the operating system is able to find available physical memory for the task. In a multiple-processor computer system, this means that the task may be allocated portions of physical memory that are local to different processors. A large portion of the physical memory allocated to a task, for instance, may be local to one processor, whereas the remaining portion of the physical memory allocated to the task may be local to another processor.
The scheduler of an operating system of a multiple-processor computer system typically does not take into consideration the location of a task's allocated physical memory when selecting a processor to which to assign the task for execution. For example, a multiple-processor computer system may have two processors. A task's allocated physical memory may be local to the first processor, but the task may be assigned to the second processor. Such a situation can detrimentally affect the performance of the task. Each time the second processor requires information stored in the memory allocated to the task that is local to the first processor, it may have to communicate with a sub-system handling memory accesses for the local memory of the first processor. Such communication can slow down execution of the task.
For example, when the second processor determines that it needs information stored in the memory allocated to the task that is local to the first processor, the second processor first determines whether it has locally cached this information, and whether the locally cached version of this information is still valid. If the second processor has not locally cached the information, or the locally cached version of the information is invalid, then the second processor must issue a request that traverses the computer system, to the sub-system that handles memory accesses for the local memory of the first processor. The sub-system retrieves the requested information, and sends it back to the second processor. Until the second processor receives the requested information, it may not be able to continue executing the task, slowing down performance. Furthermore, in some computer systems, the first processor itself may be involved in retrieving the information from its local memory and sending it back to the second processor, which slows down performance of tasks running on the first processor.
A limited solution to this problem within the prior art is to attempt to assign tasks to the processors that most recently have cached the memory allocated to the tasks. This approach at task assignment can be referred to as a “cache-warmth” approach, because the processor that has the most “warm” cache entries for a given task—that is, the processor that has previously cached the most information relating to the memory of a previous execution of the task—is assigned the task for its next execution. Warm cache entries are cache entries that pertain to a given task that have not yet been flushed out of the cache by newer cache entries, even though the task in question may have already ended its execution. Each processor of a multiple-processor system typically has an associated cache, which can be used to temporarily store information of memory that is remote to the processor, as well as to temporarily store information of memory that is local to the processor, but that has been requested by other processors. The theory behind the cache-warmth approach is that the processor that has the most warm cache entries for a previous execution of a task is likely to have as local memory most of the physical memory allocated to the task, such that this processor should be assigned the next execution of the task.
However, the cache-warmth approach can fail in at least two situations. First, the warm cache entries may be quickly flushed from a given processor's cache, due to the need by the processor to cache other portions of memory. Therefore, the cache-warmth approach may be unable to select a processor having warm cache entries for a previous execution of a task. This situation is indeed likely to occur where task performance can be critical, within multiple-processor computer systems that are running at full capacity, such that the large number of tasks running causes the caches to become flushed quickly.
Second, the processor having the most warm cache entries for a previous execution of a task may not have as local memory most of the physical memory allocated to the task. For example, a first processor may have as local memory none of the physical memory allocated to the task, and the physical memory allocated to the task may be relatively equally divided among the local memory of each of second, third, and fourth processors. In this situation, the first processor is nevertheless most likely to have more warm cache entries for the task, because it will have cached more memory pertaining to the task than any other processor, even though the first processor does not have any local memory that has been allocated to the task. The second, third, and fourth processors, by comparison, will have cached less memory pertaining to the task, since each of these processors has as its local memory only a minority of the physical memory allocated to the task in accordance with the task's resident set size. As a result, the cache-warmth approach would pick the first processor for the next execution of the task, even though this is the worst processor to select for performance reasons.
For these and other reasons, therefore, there is a need for the present invention.