1. Field of the Present Invention
The present invention generally relates to the field of data processing systems and more particularly to a method and system for improving performance in a Non-Uniform Memory Architecture (NUMA) system.
2. History of Related Art
Symmetric Multiprocessing (SMP) architectures are widely used in the design of computing servers. SMP servers are characterized by multiple processors that communicate with a common system memory across a shared bus. The limited bandwidth of the shared bus constrains the number of processors that can be deployed economically in an SMP machine and suggests the use of alternative technologies for building massively scaleable servers. In addition, standard high-volume, bus-based SMP servers are beginning to appear on the market thereby making it economically attractive to construct larger systems out of multiple standard nodes without fundamentally re-engineering the component machines.
There are possible technologies for constructing scaleable server machines. One of these is the Cache-Coherent, Non-Uniform Memory (ccNUMA) architecture, in which a special memory controller and a high-speed interconnection switch connect several SMP-based servers, which are referred to as nodes. A processor in this architecture accesses the local memory within its SMP node through its shared memory bus and accesses the remote memory residing on others nodes through the high-speed interconnect. Thus, local memory accesses are faster than remote memory accesses. The special memory controller typically uses a directory structure to ensure that all processors see shared memory accesses in a consistent and coherent manner. The result is a shared-memory architecture that does not have the limitations of a single memory bus, yet maintains the familiar shared-memory programming model.
A ccNUMA system must run, without modification, all software written for SMP machines. Performance-sensitive software running in a ccNUMA system may, however, require special tuning because of the disparity in speed between accessing local and remote memory. It would be desirable if an application could minimize its accesses to remote memory. In addition, most legacy operating systems must be tuned to become aware of the remote memory resources and to manage them effectively. There is a need, therefore, for tools and techniques to realize the performance potential of ccNUMA architectures. Commercial operating systems, however, typically lack resources for enabling application programs to dedicate specified portions of physical memory for storing the application""s data.
Performance degrades in a ccNUMA system if threads often stall waiting for the local cache to be loaded with data or instructions from remote memory or, alternatively, if the average memory latency incurred by programs increases due to remote accesses. Typically, a combination of hardware assist and operating system support is required to address the problems associated with frequent accesses to remote memory. Support at the operating system level is necessary so that a thread finds the data that it needs in nearby memory banks. Typically, this support requires non-trivial modification to several operating system modules such as memory management modules, process management, and I/O support. The resulting changes will typically be complex and increase the cost of maintaining the operating system. Furthermore, because there may be server design points in the hardware, it is difficult to modify the operating system and tune it to be highly effective for all possible design implementations.
For these reasons, the operating system supplier may be reluctant to add the necessary modifications to improve the performance of NUMA systems. If, for example, the operating system vendor strives to be portable across different machines, it may not find it economical or feasible to add specialized support for ccNUMA into the operating system kernel. On the other hand, a supplier of operating system technology that is tied to a preferred hardware platform may find it expensive to maintain different versions of the operating system, or to introduce NUMA support in the first place because of the heavy upfront investment required. It would therefore be highly desirable to implement a method by which a NUMA system running an off-the-shelf operating system supports a mechanism that permits application programs to control the physical memory space allocated for the application""s data.
The problem identified above is addressed by a method and software for allocating memory in a data processing system. Initially, a configuration table indicative of the system""s hardware resources including the system""s physical memory is generated in response to a boot event. The generated configuration table is then modified to identify only a portion of the system""s physical memory thereby reserving the remaining portion of physical memory such that operating system control of the reserved portion is prevented. Subsequently, a memory allocation request is initiated by an application program executing on the system. A device driver invoked by the application program then maps physical memory from the reserved portion to the applications virtual address allocated to satisfy the allocation request. Modifying the configuration table may include generating at least one space mapping entry in the configuration table. The space mapping entry parameter values may be derived from information in an initialization file stored on the system. The application program may be executing on a first node of a multi-node system in which each node is associated with its own local memory. In this embodiment, the node on which the allocated physical memory is located may be derived from the allocation request thereby facilitating application level, allocation of specified portions of physical memory.