1. Field of the Invention
The present invention relates to thread processing and multiprocessor systems.
2. Related Art
At any instant in time computers run multiple processes. Metaphorically, a process is a program's breathing air and living space--that is, a running program plus any state needed to continue running it. Each process can involve one or more tasks. Each task can be carried out in one or more threads. Processes can be user application processes and/or operating system processes, such as, a kernel process, a supervisor process, or an executive process. Processes can be executed continuously or interrupted repeatedly. See J. L. Hennessy and D. A. Patterson, "Computer Architecture: A Quantitative Approach," 2nd. Ed. (Morgan Kaufmann Publ.: U.S.A. 1996), pp. 439-483.
Threads for one or more processes can be executed in parallel and/or non-parallel segments. Each thread is typically represented by a context consisting of a program counter, register set, and any required context status words. Multiple threads can be executed on a single processor. Multiple threads for a single task can be run in parallel on different processors in a distributed shared memory multi-processor system. See K. Hwang, "Advanced Computer Architecture: Parallelism, Scalability and Programmability," (McGraw-Hill Inc.: U.S.A. 1993), pp. 491-504.
Each process has its own virtual address space. Virtual memory maps into physical memory. Different processes are then assigned to different regions of physical memory. With virtual memory, a CPU produces virtual addresses for program processes. Virtual memory addresses are translated (or mapped) by hardware and/or software to physical memory addresses. See J. L. Hennessy and D. A. Patterson, "Computer Architecture: A Quantitative Approach," 2nd. Ed. (Morgan Kaufmann Publ.: U.S.A. 1996), pp. 439-483.
Virtual memory allows large programs or combinations of programs to be executed whose entire code and data are too large to be stored in main memory at any one time. Only sections of code and data currently being accessed by threads are stored in virtual memory. Physical memory limitations and underlying memory-architecture need not be considered by a programmer. See G. Coulouris et al., "Distributed Systems: Concepts and Designs," 2nd. Ed. (Addison-Wesley Publ.: U.S.A. 1994), pp. 157-196. In a shared virtual memory (SVM) or virtual shared-memory (VSM) type of distributed shared-memory system, a global virtual address space is shared among different processors clustered at different nodes. See D. Lenoski and W. Weber, "Scalable Shared-Memory Multi-Processing," (Morgan-Kaufmann Publ.: U.S.A. 1995), pp. 1-40, 87-95, 143-203, and 311-316, and Hennessy and Patterson, at Chapter 8, "Multiprocessors," pp. 634-760.
A distributed shared memory (DSM) system, such as, a scalable shared-memory system or a non-uniform memory access (NUMA) system, typically includes a plurality of physically distinct and separated processing nodes each having one or more processors, input/output devices and main memory that can be accessed by any of the processors. The main memory is physically distributed among the processing nodes. In other words, each processing node includes a portion of the main memory. Thus, each processor has access to "local" main memory (i.e., the portion of main memory that resides in the same processing node as the processor) and "remote" main memory (i.e., the portion of main memory that resides in other processing nodes). For each processor in a distributed shared memory system, the latency associated with accessing a local main memory is significantly less than the latency associated with accessing a remote main memory.
As multi-programming, parallel processing, and multiprocessor architectures become more widespread, larger numbers of threads must be processed. The number of virtual memory accesses made by threads also increases. To reduce latency in scalable NUMA computer systems, it is desirable to store data in the portion of main memory that exists in the same processing node as the processor that most frequently accesses the data (or as close as possible to the processor that most frequently accesses the data). Threads and data need to be placed at or near a local memory in a node at or near a local processor that executes the threads and operates on the data. When virtual address space is partitioned or distributed across different node memories in a DSM system, it is also desirable to place threads and data at or near a node memory that stores virtual address space (and physical memory address space) accessed most frequently by the threads.