In computer systems containing multiple processors or a single processor with multiple processing logic or “cores”, applications or other tasks may be partitioned among multiple processors or cores that each performs portions of the application or task and later updates the same data structure with a partial result. For example, a task of counting a number of persons with the last name “Smith” in a phone book may be divided among two processors or cores, such that each processor or core counts the number of “Smiths” in one half of the phone book. The number of “Smiths” in the phone book may be stored in a data structure by combining (e.g., adding) the results of each partial result from the two processors or cores.
Because multiple processors or cores are updating the same data structure concurrently, however, conflict conditions, such as a “race condition”, may result, causing degradation in performance or even incorrect results to be stored in the data structure containing the result. In general, two prior art approaches to solving this problem have been implemented in multi-core or multi-processor computing systems. FIG. 1 illustrates a computer system architecture in which one prior art technique is used for concurrently updating a data structure with partial results from two or more processors or cores. In the computing architecture of FIG. 1, four processing cores or processors (“processing elements”) do not have a local memory, such as an “level 1” (L1) cache, but instead store a partial result of a computational task directly to a shared memory, such as a “level 2” (L2) cache or some other memory, such as “main memory”, which may consist of dynamic random access memory (DRAM) or some other memory type.
In the prior art system of FIG. 1, each update from the four processors to a data structure stored in the shared memory must be made in a serial manner in order to avoid conflict conditions, such as a race condition. As a result, the system illustrated in FIG. 1 may suffer from performance degradation, as each data structure update must wait on an earlier data structure update to complete. In some prior art examples, a data structure is updated by first acquiring exclusive ownership of the data structure, or “lock”. In the meantime, other agents must wait until the lock is released in order to gain exclusive ownership and subsequently update the data structure. This serialized data structure update technique may cause delays in system operation, as agents must wait to update a data structure.
FIG. 2 illustrates computing system in which at least one other prior art technique for updating a common data structure may be used. In the system of FIG. 2, each of the four processing elements has a local memory, such as an L1 cache (denoted “$L1x” in FIG. 2), to store local copies of data stored in the shared memory, such as an L2 cache or main memory, such as DRAM. In one prior art data structure update technique associated with system of FIG. 2, only one processing element may manipulate, or “own”, a copy of a data structure stored in the shared memory at a time by obtaining a exclusive ownership, or a “lock”, of the data structure. In this case, each processor may update its local copy of the data structure with its partial result stored in its local memory and other processing elements will have to request ownership of the updated copy of the data structure. Once requested, the updated data structure may be transferred to the requesting processing element, which will have exclusive ownership of the copy of the data structure until it has updated the data structure with its partial result. The above-described technique can continue until each partial result from each processing element has been updated into a local copy of the data structure. The local copy of the data structure may then be written back to the corresponding data structure within the shared memory.
The prior art example described above with reference to FIG. 2 may take numerous processing cycles to complete as well as complex logic to handle the requests and ownership transfers among the processing elements. Therefore, the system of FIG. 2 may incur performance degradation and/or increased cost.
Other prior art examples may include ones in which a user, via complex software routines, may control the coherency protocol among the various processing elements as they update local copies of the data structures, which are ultimately stored into the corresponding location of shared memory. However, these prior art “software solutions” require the user to develop coherency software and to control the entire processes, which may result in increased system cost, complexity, and performance degradation.