Access to computer resources, such as program instructions in memory, which is accessed by multiple threads, may be controlled by use of a semaphore. For example, POST and WAIT instructions may be used to allow exclusive access to an area of memory when two threads, thread A and thread B, are both executing. When a thread gains access to the area of memory, a WAIT instruction is executed, which prevents thread B from gaining access to the area of memory until thread A has completed accessing the area of memory. When thread A is finished accessing the area of memory, a POST instruction is executed, and thread B is able to access the area of memory.
Referring to FIG. 1, a typical computer system upon which threading may be implemented includes a microprocessor (10) having, among other things, a CPU (12), a memory controller (14), and an on-board cache memory (16). The microprocessor (10) is connected to external cache memory (“e-cache”) (18) and memory (20) that holds data and program instructions to be executed by the microprocessor (10). Internally, the execution of program instructions is carried out by the CPU (12). Data needed by the CPU (12) to carry out an instruction is retrieved by the memory controller (14). Upon command from the CPU (12), the memory controller (14) searches for the data first in the on-board cache memory (20), next in the e-cache (18), and finally in the memory (20). Finding the data in the cache memory is referred to as a “hit.” Not finding the data in the cache memory is referred to as a “miss.”
The rate of hits and misses (“hit rate” and “miss rate”) depends, in no small part, on the caching scheme or policy employed by the computer system, e.g., direct-mapped or set associative. For some computer applications, a direct-mapped policy may provide better system performance due to a better hit rate, and for other computer applications, a set associative caching scheme may prove more beneficial. This performance variation depends on such details as the address sequences used by the application, the allocation of memory pages to an application by the operating system, and whether virtual or physical addresses are used for addressing the cache, etc.
Caching schemes are also used in multiple-microprocessor computer architectures. FIG. 2 shows a multiple-processor computer architecture using processor 1 (40) and processor 2 (42). Processor 1 (40) and processor 2 (42) may include standard components, such as a CPU, a memory controller, and an on-board cache memory. Processor 1 (40) and processor 2 (42) are each connected to an associated e-cache memory. Processor 1 (40) is associated with e-cache 1 (46), processor 2 (42) is associated with e-cache 2 (48). Each processor may execute a program in the form of an instruction stream, which is located in memory (20). For example, processor 1 (40) may execute the instruction stream (52), and access a data segment (54) while executing the instruction stream (52).
FIG. 3 shows a flowchart with exemplary operations relevant to the execution of an arbitrary instruction of the instruction stream (52) by the processor 1 (40) using the computer architecture shown in FIG. 2. First, processor 1 makes an attempt to retrieve the instruction from the on-board cache (Step 60). A determination is then made as to whether the attempted retrieval from the on-board cache is successful (Step 62). If the attempted retrieval from the on-board cache is successful, the instruction is executed (Step 64). Otherwise, processor 1 makes an attempt to retrieve the instruction from the e-cache 1 (Step 66).
A determination is then made as to whether the attempted retrieval from the e-cache 1 is successful (Step 68). If the attempted retrieval from the e-cache 1 is successful, the instruction is executed (Step 64). Otherwise, the instruction is retrieved from memory (Step 70). The instruction is then stored in the e-cache 1 and/or the on-board cache (depending on the particular caching implementation and policy) (Step 72). Processor 1 then executes the instruction (Step 64). After execution, processor 1 then retrieves data from the data segment (Step 74) and the data is stored in the e-cache 1 (Step 76). Those skilled in the art will appreciate that the steps shown in FIG. 3 are likely to be performed repeatedly during execution of a software program, which includes numerous instructions.
A size of the instruction stream may be larger than a fixed size of the e-cache 1. For example, the size of the e-cache 1 may be 8 megabytes (8 M) in size, and the size of the instruction stream 1 may be 10 M in size. Therefore, after 8 M of instructions from the instruction stream 1 have been stored in the e-cache 1 from the memory, depending upon how caching is implemented, the necessary subsequent retrievals of instructions may result in a cache miss, which requires retrieval of the instruction from memory, resulting in slower execution of the program.