Computer has long been employed to process data. In a typical computer system, a plurality of application programs may be executed. Since a computer system has limited resources, the ability for a computer system to optimize memory performance becomes critical as more application programs are competing for the same memory resources.
One method for improving memory performance of an application program is to manage the layout of data fields in a data structure. FIG. 1 shows a conceptual block diagram illustrating a data structure. Consider the situation wherein, for example, a user is attempting to access a data structure that has a plurality of employee records, such as an employee record. The user may employ an application program to employ a processor 102 to access an employee record 104 in main memory 106. Employee record 104 may include a plurality of data fields, including a name field 108, an employee number field 110, a salary field 112, and an address field 114.
Since the latency effect on a cache is less than on main memory, memory performance increases if an application program is accessing data stored in the cache than in main memory. To minimize the number of time the application program may have to access the main memory, time and resources have been spent in optimizing structure layout of a data structure. In an example, data being processed may be copied into a cache 116 for faster access as a cache line. To increase the possibility that data fields that may be accessed simultaneously are brought into cache 116 at the same time, spatial locality optimizations may be performed. As discussed herein, spatial locality refers to the arrangement of data fields in a manner that may increase the likelihood of data fields that may be assessed together are brought in on the same cache line. In an example, name field 108, employee number field 110, and salary field 112 are referenced together, such as in the same loop of an application program. To increase the cache hit, the three aforementioned data field may be placed close to one another in order to increase the possibility that the three data fields are brought together into cache 116 in the same cache line, such as a cache line 118.
For a single thread application, spatial locality generally improves memory performance. Unfortunately, optimizing spatial locality in a multi-threaded environment may also cause false sharing to occur, resulting in worsening memory performance. False sharing usually occurs in a multi-threaded environment in which two or more processes/threads are attempting to access a cache line simultaneously.
FIG. 2 shows a simple conceptual diagram illustrating false sharing in a multi-threaded environment. Main memory 202 may include a data structure with a plurality of data records, including employee record 204. Employee record 204 may include a plurality of data fields, such as a name field 206, an employee number field 208, a salary field 210, and an address field 212.
Consider the situation wherein, for example, multiple threads are trying to access the employee record. Each of the threads may be associated with a processor (e.g., processor 214, 216, and 218). During execution of the application program, multiple processors (e.g., processor 214, 216, and 218) may be accessing the same data fields, which may be copied into cache 220, cache 222, and cache 224, respectively, as a cache line 226.
Multiple threads may access the same cache line without causing conflict as long as each of the threads is only reading one or more data fields from the same cache line. However, if a processor attempts to write (e.g., add, modify, etc.) to a data field, while other processors are accessing the same cache line, then a cache coherency problem may occur. As discussed herein, cache coherency refers to the integrity of cache line saved at the different caches. In other words, an update to a cache line needs to be replicated and made visible to the other processors in order to maintain integrity of data and prevent conflict.
In an example, processor 214 wants to modify salary field 210, processor 216 is reading name field 206, and processor 218 is reading employee number field 208. If processor 214 modifies salary field 210, the data fields stored in cache 222 and cache 224 are not updated with the change and conflict may arise. As a result, false sharing may occur since the processors are now referencing different versions of the same cache line.
In order to prevent false sharing, a processor may have to gain full ownership of the cache line before a change may be made to the data fields in the cache line. In an example, before processor 214 may modify salary field 210, processor 214 may have to invalidate the other copies of cache line 226, which may reside in cache 222 and cache 224. Accordingly, the efficiency that a multi-threaded environment should provide is diminished due to false sharing. As a result, memory performance in a multi-threaded environment may actually deteriorate since processors are expending resources to gain ownership of cache lines in order to prevent false sharing.