1. Field of the Invention
Embodiments of the present invention provide an anti-prefetch instruction. More specifically, embodiments of the present invention use an anti-prefetch instruction to facilitate parallel execution of code.
2. Related Art
In order to execute code more efficiently, multi-stranded processors have been designed to use two or more hardware strands while executing a single software thread. Some multi-stranded processors also support transactional execution, during which the processor guarantees code and memory atomicity. Transactional execution and multi-stranded processors are both known in the art and hence are not described in more detail.
In some multi-stranded processors, a section of program code can be divided into subsections and the subsections can be executed in parallel using separate strands. For example, code that inserts values from a data array into a hash table can be split into separate subsections of code that perform even and odd index array accesses. These separate subsections can then be executed in parallel using two separate strands. In such a processor, the execution of the subsections may not be independent because the strands may access the same locations in memory. To remedy this problem, the processor can execute a first subsection normally using the first strand while transactionally executing the second subsection using the second strand. Thus, if the second strand makes a memory access during the transaction that interferes with a memory access that is subsequently made by the first strand, the processor can detect the interference and can re-execute one or both of the subsections.
Because of the potential for interfering accesses, the second strand cannot finish executing the second subsection until the first strand has completed executing the first subsection. Consequently, such processors include mechanisms for ensuring that the first strand completes before the second strand commits the transaction. For example, some processors use a “spin loop” technique, wherein upon completing the first subsection, the first strand stores a predetermined value to a “mailbox” location in memory. Upon completing the second subsection, the second strand transactionally loads the mailbox to ensure that the predetermined value is stored in the mailbox before committing the transaction. Because the second stand may finish the second subsection before the first strand completes the first subsection, the second strand can transactionally load from the mailbox before the first strand stores the predetermined value to the mailbox. Unfortunately, because the second strand load-marks the cache line when performing the transactional load, the first strand, which is subsequently storing the predetermined value to the mailbox, can erroneously cause the processor to detect an interfering access and can unnecessarily cause the second strand's transaction to fail.
Hence, what is needed is a processor that supports transactional execution without the above described problem.