This invention relates to a method of managing memory within a multiprocessor system formed of a plurality of processing elements, and more particularly, relates to a method of controlling data stored in a shared memory so as to maintain consistency (coherency) thereof based on information acquired by a compiler.
Multiprocessor systems in which a plurality of processing elements are integrated have been released one after another by respective microprocessor manufacturers. In the fields of information home electric appliances and device embedding (such as cellular phones, game machines, car navigation systems, digital television receivers, HDD/DVD recorders/players), as well as in the fields of super computers, servers, desktop computers, and PC servers, there is a trend toward employment of multi-core microprocessors.
The multiprocessor system includes a plurality of processing elements, an inter-connection network, and a centralized shared memory, and each of the processing elements includes a processor and a cache memory and independently performs arithmetic processing. The multiprocessor system uses the centralized shared memory as a main memory, and the plurality of processing elements are used as main memory sharing processors that access the same data stored in the centralized shared memory.
In order to maintain the coherency among shared data pieces, it is necessary to perform such coherency control where a processor is accessing a shared data piece on the cache memory, another processor is inhibited from loading the same shared data piece from the centralized shared memory to the cache memory for access thereto.
Herein, the coherency means that all the processors can access a value stored at an address of the memory, as the same value at a given time instant, and refers to control for guaranteeing that contents of the memory accessed by the respective processors are the same within a main memory sharing multiprocessor system. Functions for maintaining the coherency include a coherent cache that controls memory access by hardware.
A first problem to be solved in the coherency control is stale data, and a second problem thereof is false sharing.
FIG. 22 is an explanatory diagram illustrating the first problem (stale data) in the coherency control.
First, global variables a, b, and c are declared (2200), and variables a=0, b=0, and c=1 are stored in the shared memory (2201).
After that, in a case where the shared data (a=0, b=0, and c=1) is stored in the cache memory of a processing element (PE0) (2202) and the same shared data is stored in the cache memory of another processing element (PE1) (2203), even if the shared data is updated (a=0→1) by the PE0, the shared data on the cache of the PE1 is old data that has not been updated (a=0) (2205). In this state, when the shared data is updated (c=a) by the PE1, the variable c is updated to 0 without copying a correct value of a (2206).
Therefore, the variables should be a=1, b=0, and c=1 if the coherency control has been performed, become a=0, b=0, c=0. Therefore, data stored in the cache memory of the PE0 does not match data stored in the cache memory of the PE1. Therefore, the PE1 operates erroneously.
FIG. 23 is an explanatory diagram illustrating the second problem (false sharing) in the coherency control.
First, the global variables a and b are declared (2300), and the variables a=0 and h=0 are stored in the shared memory (2301). The variables a and b are stored on the same cache line of the shared memory. Further, the shared memory is accessed by each line.
After that, the shared data stored in the cache memory of a processing element (PE0) is updated (a=0→1) (2302), and the shared data stored in the cache memory of another processing element (PE1) is updated (b=0→2) (2303). In other words, the respective processing elements update the different variables stored on the same line. In this case, when the PE0 first writes back data to the shared memory, data which the PE1 writes back data later is stored in the shared memory (2304). On the other hand, when the PE1 first writes back data to the shared memory, data which the PE0 writes back data later is stored in the shared memory (2305).
If the coherency control is performed, a=1 and b=2 are stored in the shared memory, but if the coherency control is not performed, it is not certain which data is finally stored in the shared memory. In other words, the contents of the memory differ depending on a line destaging timing, and the processing element cause an erroneous operation in any case.
In order to solve such a problem that a mismatch occurs between the shared memory and the cache memory, a coherency control module is provided to the respective processing elements and shared resources (such as the inter-connection network and the shared memory), to thereby maintain the coherency of the data stored in the memory.
Specifically, until a processing element (PE0) reads data x from the shared memory, updates the data x, and discards ownership of the data x, another processing element (PE1) is not permitted to write the data x to the shared memory.
Through such ownership control, it is possible to solve the problems of the stale data and the false sharing which reside in the coherency control.