The present invention generally relates to a parallel computing system. More particularly, the present invention relates to processing a store instruction in the parallel computing system.
A store instruction refers to an instruction issued by a processor core, e.g., in a parallel computing system, for storing a content of a register into a memory location. The store instruction specifies the memory location where the content is to be written. Under a strong consistency model, a processor core issues a store instruction. The issued store instruction is visible to other processor cores. Updates through store instructions are processed in their issued order. In other words, a first issued store instruction is processed first, a second issued store instruction is processed second, and so on.
Under a weak consistency model, processor cores issues store instructions in an arbitrary order. A processor core does not need to wait to issue a store instruction. Processor cores can issue store instructions out of order. In this weak consistency model, after a processor core issues a store instruction, this processor core sets a flag bit on data in a shared main memory device and/or shared cache memory device being updated by the store instruction. Other processor cores can see this flag bit set. However, this flag bit set does not guarantee that the data associated with the flag bit is valid. Thus, to validate data and/or synchronize issued store instructions, a processor core issues a synchronization instruction called msync instruction which ensures store instructions issued be processed in their issued order. The msync instruction ensures in order processing of issued store instructions. After running the msync instruction, other cores access data updated by the store instruction(s). However, the msync instruction is an expensive instruction, i.e., takes more than 100 clock cycles.
Therefore, it is highly desirable to allow out of order issuance of store instructions and process the store instructions in a parallel computing system without using the msync instruction.