1. Field of the Invention
The present invention relates generally to the data processing field and, more particularly, to a computer implemented method, system and computer program product for automatically generating SIMD code, particularly in the presence of multi-threading and other false sharing conditions, and in machines having a segmented/virtual page memory protection system.
2. Description of the Related Art
Modern processors are using Single Issue Multiple Data (SIMD) units with greater frequency in order to increase processing power without having to significantly increase issue bandwidth, since SIMD units allow multiple data units to be processed in one computation. Although SIMD units can be programmed by hand, especially for dedicated libraries and a small number of kernels, the performance impact of SIMD units will likely remain limited until compiler technology permits automatic generation of SIMD code, referred to hereinafter as “simdization”, for a wide range of applications.
One salient feature of modern processors that has had a strong impact on SIMD code generation is support for multi-threading and parallelism, referred to herein as “MT”. A characteristic of MT that is particularly relevant to simdization is that multiple threads can cooperate to generate results, for example, by working on independent computations stored in distinct memory locations. This aspect of MT becomes problematic if false sharing is introduced when simdizing the code, i.e., when two different threads read/modify/write distinct memory locations that happen to be collocated within the same single unit of SIMD memory access (e.g., 16 bytes for VMX, SSE-2 and others). This issue is a correctness issue since the final outcome of a program depends on the order in which the multiple threads access such “falsely shared” unit of memory.
False sharing may also occur on machines without support for multi-threading as an artifact of the compiler being insufficiently aware, due to lack of information flow and/or imprecise information, of multiple distinct data structures residing within the same unit of SIMD memory access. In such a situation, the compiler might think that accesses to two distinct data structures, for example, the last element of a data array A and the first element of a data array B, where data arrays A and B are collocated in memory, can be interchanged; when in fact, this is not the case because the two references have a false sharing situation (for example, the last element of A and the first element of B reside in the same 16 byte unit of memory). Hereinafter, such compiler scheduling issues are specifically referred to as “CS”.
When referring to either the CS or the MT cases, the “FS” label (for False Sharing issue) is used.
Another salient feature of modern processors that has had a strong impact on SIMD code generation is support for segmented/virtual page memory protection systems, referred to herein as “MPS”. The characteristic of MPS most relevant with respect to simdization is that memory accesses beyond a memory segment are required to generate a memory violation (e.g. for program integrity). This requirement will cause a problem if the generated SIMD code accesses memory locations that are beyond the range of locations touched by the original, non-simdized code. This is not a program correctness issue, as in the case of a false sharing condition, because the values of the additional memory locations touched by the simdized code are not used to modify permanent program state. The sole issue in this situation is that such memory locations should not be addressed at all as they may be outside of their memory segments.
In general, generating SIMD code without being aware of the possibility of multi-threading and other false sharing conditions can result in code that incorrectly halts or that generates incorrect results. In a similar manner, generating SIMD code without being aware of MPS may result in code that incorrectly halts.