In cache-coherent multiprocessor systems, the hardware maintains data cache coherency to preserve the validity of data. The data cache coherency is performed via a coherency protocol, which may include snooping or directory-based techniques. One cache coherency protocol is the MESI (Modified, Exclusive, Shared, Invalid—referring to states of a cache line) protocol. Cache coherency may include writing data changes to multiple caches, and may include mechanisms to prevent access to the same resource (e.g., a particular variable, a database value) by multiple processors, or simultaneous modification of data by multiple processors. Mechanisms to avoid collision of access to a resource/data by multiple processors can be referred to generically as synchronization constructs (also referred to as critical sections, locks, semaphores, etc.), which operate to dedicate a particular resource to one processor and exclude other processors from access while it is locked.
Specific lock avoidance techniques have been developed for multiprocessor networking environments. In general, data cache locality/affinity improves cache performance because fewer cache misses result when a processor's operations focus on data already stored in the cache. To attempt to enhance data cache affinity, some multiprocessor networking systems are programmed to associate a single traffic flow with a single processor. Techniques such as receive side scaling (also sometimes referred to as flow pinning) attempt to keep all traffic associated with a flow at the same processor and associated cache for improved cache data reuse. Another technique developed is speculative lock avoidance (also called speculative lock elision), which involves runtime coordination (e.g., hardware and software operating together) to provide faster execution of some routines. The speculative lock elision technique involves speculatively assuming at run-time that parallel operations by multiple processors will succeed without locks, temporarily ignoring the locks and performing the operations, and then recovering from misprediction by undoing changes made with the misprediction.
The techniques described above are implemented in source code that will operate on the processors. Source code is typically generated to implement one or more technique described above. To generate executable code (often referred to as binary code) from source code, a compiler is used, which essentially translates source code, or code from a higher-level language (e.g., C, C++, JAVA™, etc.), into a lower-level format (e.g., machine code). Compilers are often designed to check for code patterns, and a “smart” compiler can provide succinct code (often referred to colloquially as “optimized” code) by recognizing source code patterns/constructs. Compilers often allow for special directives (e.g., many C compilers recognize the “#pragma” directive) to be inserted into the source code, which may provide information/processing instructions to the compiler to indicate how code should be interpreted/compiled. Typically a compiler ignores a directive that it does not recognize/understand. Each of the above techniques use source code that a compiler will compile into executable code. The result of the techniques is a combination of software and hardware working together to avoid some critical sections, but that still include unnecessary cache coherency overhead when distributing related operations in a multiprocessor environment.