1. Technical Field
The present invention relates generally to computer processing systems and, in particular, to a method and system for dynamically changing page types in unified scalable shared-memory architectures.
2. Background Description
Scalable shared memory multiprocessors offer significant computing power and have the advantages of ease of use and programmability. Such architectures typically consist of a scalable number of workstation-class nodes connected by an interconnection network. Each node consists of one or more computation units and one or more levels of caching and/or memory. A global address space is used for inter node communication.
In a cache-coherent non-uniform memory architecture (ccNUMA), accessing the physical memory at the local node can be an order of magnitude faster than accessing the remote memory. The relatively long access latencies incurred when accessing the remote memory can prohibitively degrade the performance of such multiprocessors. Local remote access caches (RAC) can be used to address this degradation. An application running on this architecture exhibits the best performance when its working set is contained within the memory hierarchy of the node. Otherwise, repeated remote memory accesses may occur, resulting in potentially prohibitive performance.
While RACs in ccNUMA machines can be used to address the issues associated with long latency remote accesses, these caches are relatively small and, therefore, have limited effectiveness for some applications. These issues can be addressed more effectively by using machines with a simple cache only memory architecture (SCOMA).
SCOMA uses the memory associated with each node as a higher level cache. SCOMA reduces the frequency of long remote memory accesses by migrating and replicating data to the local nodes. SCOMA can quickly adapt to the dynamic memory reference behavior of executing applications, reducing the effective memory access time. In SCOMA, the paging software manages cache space allocation and deallocation. Less hardware is used to maintain data coherence and no cache tag hardware is needed.
SCOMA architectures use the local node memory as the page cache, with the page as the placement granularity. This facilitates improved performance, particularly for applications exhibiting good spatial reference locality. However, SCOMA suffers from increased hardware costs for the coherence controller and from increased memory consumption and low page utilization. Dynamically mixed page types allow both schemes to exist concurrently on a page basis. This facilitates performance improvements by exploiting the advantages of both ccNUMA and SCOMA architectures in a unified architecture.
Unified architectures can contain mechanisms to facilitate the dynamic typing of pages in local nodes. Unified architectures dynamically adapt between ccNUMA and SCOMA architectures according to the reference patterns of the executing programs. This dynamic adaptation results in better performance because the unified architecture has the ccNUMA advantages of relatively low memory allocation overhead, fine-grain spatial locality, short term temporal locality and minimizing coherence miss traffic. The unified architecture also has the SCOMA advantages of predominantly local data access, coarse-grain spatial locality, long term temporal locality, minimizing conflict and capacity misses, dynamic data migration and fault containment. These advantages are described further by: B. Falsafi and D. Wood, in xe2x80x9cReactive NUMA: A Design for Unifying S-COMA and CC-NUMAxe2x80x9d, Proceedings of the 24th Annual International Symposium on Computer Architecture. pp. 229-50, Denver, Colo., June, 1997; and K. Ekanadham, H.-H. Lim, P. Pattnaik, and M. Snir, in xe2x80x9cPRISM: An Integrated Architecture for Scalable Shared Memoryxe2x80x9d, Proceedings of the Fourth Symposium on High Performance Computer Architecture, January, 1998.
FIG. 1 is a block diagram of a node coherence controller 100 for a unified scalable shared memory architecture according to the prior art. The unified coherence controller 100 includes both a RAC(s) 102 and a page cache(s) 104. In addition, the unified coherence controller 100 includes: a protocol dispatcher and finite state machine (FSM) 106; fine grain tags 108 for lines in the page cache; a directory 110; a translation table 112; and a network interface 114. A memory bus 116 is also shown. The controller 100 is an integration of typical ccNUMA and SCOMA controller architectures. The translation table 112 may contain information about static home nodes and other information, in addition to serving as an address translation table.
Memory pressure is the amount of memory required to contain the working set of an application. For example, if the entire working set of an application can be placed in the cache, then the memory pressure for that application is low. However, if the working set cannot fit in the cache, then the memory pressure is high.
If the working set of an application does not fit into the RAC, then the overflow data is placed into the page cache. This minimizes the number of remote access requests processed relative to ccNUMA architectures. This improves performance if the working set fits into the page cache (low memory pressure). However, when xe2x80x9cmemory pressurexe2x80x9d is high, performance can be worse than ccNUMA and SCOMA on some applications. This behavior is due to the overhead associated with the constant remapping of pages, and is further described in the above-referenced article by B. Falsafi and D. Wood, entitled xe2x80x9cReactive NUMA: A Design for Unifying S-COMA and CC-NUMAxe2x80x9d. Dynamically switching between ccNUMA and SCOMA-like architectures on a page basis per node can improve performance when addressing these issues. However, prior art methods and systems directed to dynamic switching suffer from significant internal fragmentation of the page cache, underutilization of pages, and a costly page relocation process. This can result in prohibitive performance degradation.
Thus, it would be desirable and highly advantageous to have a method and system for dynamically changing page types in unified scalable shared-memory architectures that overcome the above problems of the prior art methods and systems for achieving the same.
The present invention is directed to a method and system for dynamically changing page types in unified scalable shared-memory architectures. The present invention addresses the issue of under-utilized pages in unified scalable shared memory architectures.
According to a first aspect of the invention, there is provided a method for dynamically changing page types in a unified scalable shared-memory architecture. The method includes the step of assigning a default page type of a given page as simple cache only memory architecture (SCOMA). Upon n memory references, a first parameter of the given page is calculated. A second parameter of the given page is calculated, when the first parameter is greater than a first threshold. The page type of the given page is dynamically changed to cache-coherent non-uniform memory architecture (ccNUMA), when the second parameter is greater than a second threshold. The first and the second parameters are one of a page reference probability and one minus a page utilization, the second parameter being different than the first parameter.
According to a second aspect of the invention, the method further includes the step of maintaining the page type of the given page as SCOMA, when the first parameter is less than or equal to the first threshold. According to a third aspect of the invention, the method further includes the step of maintaining the page type of the given page as SCOMA, when the second parameter is less than or equal to the second threshold.
According to a fourth aspect of the invention, the method further includes the step of adjusting at least the first or the second threshold. According to a fifth aspect of the invention, the method further includes the step of adjusting n corresponding to the n memory references.
According to a sixth aspect of the invention, the n memory references correspond to all pages. According to a seventh aspect of the invention, the n memory references correspond only to the given page.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.