The present disclosure relates generally to data splitting in a data processing system and more specifically to data splitting for multi-instantiated objects in the presence of copy assignments with pointer offset operation in the data processing system.
Data splitting, as described in P. Zhao, S. Cui, Y. Gao, R. Silvera, and J. N. Amaral. Forma: A Framework for Safe Automatic Array Reshaping, ACM Transactions of Programming Languages, 30(1) 2007, is a proven effective compiler transformation to improve data locality and to reduce the memory footprint, resulting in better data cache efficiency. The transformation is particular useful in modern programs in which the dynamic memory allocation is widely used.
To ensure a safe code transformation, different data splitting mechanisms have been discussed for dynamically allocated objects depending on the code patterns of how the objects are allocated—whether the object is single instantiated or multiple instantiated. For a single instantiated object, there must be a single allocation point for the particular object type, and this point must be executed no more than once at run time. The base address of the object is therefore a constant at runtime. Therefore each field of an aggregated object can be split into a separate new object in which the base address is also constant at runtime.
For multiple instantiated objects, there may be more than one instantiation for a particular object type at program runtime. In cases where there is effectively a single instantiation for the object at program runtime, and when the compiler is unable to statically prove the memory allocation site is only executed no more than once, adopting the multi-instantiation data splitting mechanism for a safe code transformation is difficult.
Two methods proposed for data splitting in the presence of multiple object instantiations include a first method, which is an object descriptor technique on regular array objects, as is described in S. Cui and R. Silvera, Efficient Method of Data Reshaping for Multidimensional Dynamic Array Objects in the Presence of Multiple Object Instantiations, U.S. Pat. No. 8,015,556. For each object instantiation, an object descriptor is introduced to record information such as the base address and the current address of the objects. The object descriptor method can be applied to many programs with data splitting to improve the code performance. However, for some programs, the first method may become inefficient and might incur a significant increase in runtime of the program particularly when there are pointer assignments with pointer-offset operation in hot spots of the program. The inefficiency may occur because a copy of the object descriptor is created when there is a pointer assignment with pointer-offset operation, which implies the current address, may change for the candidate pointers. For example, a code snippet abstracted from a CPU2006 benchmark mcf (pbeampp.c) (available from www.spec.org, http://www.spec.org/cpu2006/index.html) is as follows:
for (act_t * arc = net->arcs ; arc < net->stop_arcs; arc += 1) { ...... perm[basket_size++]->a = arc; ......}
In the example, a compiler is able to determine a safe split for the object of arc in the whole program, but the runtime performance of the loop will suffer when the compiler applies the object descriptor technique due to inserting the copy of the object descriptor in the loop. Note that the compiler will consider the object of arc multiple instantiated and cannot determine the respective sizes at allocation time statically.
The second method is a memory pooling assisted data splitting technique on recursive data structures, for example, as is described in R. Archambault, S. Cui, S. Curial, Y. Gao, R. Silvera, and P. Zhao, Data Splitting for Recursive Data Structures, United States Patent Application Publication 2009/0019425. For each aggregated data type, a memory pool set is provided, which consists of one or more memory pool units, and the size of the memory pool units is determined constant for the same data type at compile time statically. This method reduces the addressing overhead in other splitting techniques since the address of each field is easily available at compile time. The drawback of the second method however is only the recursive data structures are handled and not multiple-instantiated objects with a size that cannot be determined at compile time.
Therefore, there is a need for an efficient method that allows compilers to apply data splitting for multi-instantiated objects in the presence of pointer assignments with pointer offset operation in programs to improve code performance.