1. Technical Field
The present invention relates generally to information processing systems and, more specifically, to dynamically repacking data to facilitate consolidation of multiple memory write instructions.
2. Background Art
As processor designers continue to make strides in the performance speed of processors, the gap between processor speed and the speed of memory hierarchy technology has widened. Accordingly, the performance penalty associated with memory fetches becomes relatively more expensive as processor speeds increase.
Various approaches have been implemented in an attempt to decrease the impact of memory fetches during the execution of a processor. One such approach is known as coalescing. Coalescing is an approach that can be implemented to optimize the binary code generated during compilation of a software program. In the coalescing approach, several memory accesses for individual data are lumped into one aggregate memory access for one aggregate datum. On most processors, coalescing can be seen as the replacement of two store instructions for x-wide data from contiguous memory zones with one store instruction for 2x-wide data, where the 2x-wide data represents the coalesced data of the two x-wide instructions. For example, the two lines of pseudo-code below:
store_1byte_starting_at (data1, address)store_1byte_starting_at (data2, address + 1)can be coalesced and replaced by a single line:store_2bytes_starting_at(coalesced_data_address).Accordingly, by storing multiple bytes of coalesced data at once, the cost usually associated with issuing a memory write operation (such as, e.g., a store instruction) may be amortized across the several such memory write operations. As used herein, the word “operation” is used to indicate the operation of a processor to process a single instruction, such as a store instruction. One skilled in the art will recognize that a single store instruction may be processed as a plurality of micro-operations, but that processing of the store instruction is nonetheless referred to herein as a single “operation.”
Coalescing is a useful approach to avoid certain cache penalties associated with successive store operations. A cache penalty is any additional cost incurred in implementation-dependent corner cases, such as the inability of a cache implementation to service two stores to the same cache line within a 1-cycle time budget.
In addition, it has been observed that contiguous store instructions occur quite frequently in software programs. For instance, it has been observed that, for many software programs, a majority of execution time associated with the program is spent executing a small number of loops, referred to as “hot loops.”
Consider the loop
for (i=0; i<N; i++) {Loop 1A[i] = B[i]}On successive iterations, the value of B[i] is loaded from memory and is stored at A[i]. Accordingly, consecutive store instructions modify contiguous memory addresses. In such a straightforward case, it may be that the successive load instructions in Loop 1 may be coalesced to load more than one piece of data from B in a single operation. It is also possible in the straightforward example of Loop 1, to coalesce the data to be stored in successive locations of the array, A.
A challenge arises, however, for many prior art coalescing schemes when one side of a statement within a loop is not amenable to coalescing. Many prior art coalescing schemes decline to coalesce data when the right-hand side of the assignment statement within a loop does not guarantee contiguous memory accesses. Another challenge for prior art coalescing schemes arises when the statements within a loop modify contiguous memory locations, but the contiguous memory locations are modified out of order. Embodiments of the method and apparatus disclosed herein address these and other concerns related to coalescing schemes.