Many current computer systems employ a multi-processor configuration that includes two or more processing units interconnected by a bus system and each being capable of independent or cooperative operation. Such a multi-processor configuration increases the total system processing capability and allows the concurrent execution of multiple related or separate tasks by assigning each task to one or more processors. Such systems also typically include a plurality of mass storage units, such as disk drive devices to provide adequate storage capacity for the number of task executing on the systems.
One type of multi-processor computer system embodies a symmetric multiprocessing (SMP) computer architecture which is well known in the art as overcoming the limitations of single or uni-processors in terms of processing speed and transaction throughput, among other things. Typical, commercially available SMP systems are generally “shared memory” systems, characterized in that multiple processors on a bus, or a plurality of busses, share a single global memory or shared memory. In shared memory multiprocessors, all memory is uniformly accessible to each processor, which simplifies the task of dynamic load distribution. Processing of complex tasks can be distributed among various processors in the multiprocessor system while data used in the processing is substantially equally available to each of the processors undertaking any portion of the complex task. Similarly, programmers writing code for typical shared memory SMP systems do not need to be concerned with issues of data partitioning, as each of the processors has access to and shares the same, consistent global memory.
There is shown in FIG. 1 a block diagram of an exemplary multiprocessor system that implements an SMP architecture. For further details regarding this system, reference shall be made to U.S. Ser. No. 09/309,012, filed Sep. 3, 1999, the teachings of which are incorporated herein by reference.
Another computer architecture known in the art for use in a multi-processor environment is the Non-Uniform Memory Access (NUMA) architecture or the Cache Coherent Non-Uniform Memory Access (CCNUMA) architecture, which are known in the art as being an extension of SMP but which supplants SMPs “shared memory architecture.” NUMA and CCNUMA architectures are typically characterized as having distributed global memory. Generally, NUMA/CCNUMA machines consist of a number of processing nodes connected through a high bandwidth, low latency interconnection network. The processing nodes are each comprised of one or more high-performance processors, associated cache, and a portion of a global shared memory. Each node or group of processors has near and far memory, near memory being resident on the same physical circuit board, directly accessible to the node's processors through a local bus, and far memory being resident on other nodes and being accessible over a main system interconnect or backbone. Cache coherence, i.e. the consistency and integrity of shared data stored in multiple caches, is typically maintained by a directory-based, write-invalidate cache coherency protocol, as known in the art. To determine the status of caches, each processing node typically has a directory memory corresponding to its respective portion of the shared physical memory. For each line or discrete addressable block of memory, the directory memory stores an indication of remote nodes that are caching that same line.
There is shown in FIG. 2 a high-level block diagram of another exemplary multiprocessor system but which implements a CCNUMA architecture. For further details regarding this system, reference shall be made to U.S. Pat. No. 5,887,146, the teachings of which are incorporated herein by reference.
The operating systems for such multiprocessor systems or the user application(s) for execution on such multiprocessor systems can employ a methodology whereby a lock is used to protect multiple data items, and/or multiple instances of a data item that are in the memory (e.g., RAM) of the multiprocessor system. For example, a data set that includes the telephone number for a user(s), user name(s) and addresses of the user(s). Each time the operating system or user application accesses the data or data items in memory, such as for a read or a writing operation, a global lock is acquired (STEP 2, FIG. 3) over the data/data items. After the global lock is acquired, the user application or operating system accesses the data/data items for example in one of a read/write operation (STEP 4, FIG. 3). After accessing the data/data items, the global lock is released (STEP 6, FIG. 3).
The foregoing process shown in FIG. 3 is intended to ensure that the data or data items are not changed during the time period when the data is being accessed, for example, for a read or write operation. In other words, more than one read/write operation cannot be performed at the same time. After the global lock is released the operating system or user application can perform this process again for the next or another accessing of data/data items.
Although this techniques is simple, acquiring a global lock each and every time data is to be accessed for a read/write operation for example, becomes increasingly more time consuming as more concurrent accessing operations are initiated and therefore contend for the lock. As also indicated above, while the accessing of data in the memory for one given accessing operation is being performed, other accessing operations trying to access the same data cannot be performed, thus the applications programs/operating system/processors involved with such other accessing operations are unable to proceed (i.e., operation pended or delayed). Further, in the case where the global lock is obtained over a data set comprising multiple data items, the other accessing operations can be for data items not involved with the one given accessing operation being performed.
For example, lets assume that the data being stored in the memory is a phonebook type of listing, that includes name, address and telephone number of all subscribers, and which is accessed by any one of a number of operators to obtain listing information to give to callers. When one operator accesses the data for one subscriber, a global lock is obtained over the data for all subscribers. Similarly, if a person is updating the data provided for a given subscriber, a global lock is obtained over all of the data for all subscribers until the updating operation is completed and the global lock released. Consequently, the next operator attempting to access the data must await the release of this global lock before the next accessing of the data can proceed.
If one or more applications programs and/or the operating system that are being run on a multiprocessor system require or involve frequent accessing of common data in the memory, then the various data accessing operations can become in competition with each other. As a consequence, the competing data accessing operations can cause the processing of the one or more data operations to be delayed. Consequently, the time to perform a task by an applications program and/or an operating system is in effect increased. Although the foregoing is described in connection with specific multiprocessor system implementations, it should be recognized that similar time delays can occur in other multiprocessor system configurations in which an applications program or operating system experiences data access global lock contention as described herein.
It thus would be desirable to provide new methodologies or techniques for optimizing applications programs and/or operating systems so as to reduce such global lock contentions while ensuring the data being accessed from the memory is not corrupted or changed during the time period when the data is being accessed. Further, it would be desirable to provide such methods that would reduce the amount of time to perform tasks being performed by the applications program or operating systems as compared to prior art methods and techniques.