In the compilation of computer programs, the compiler attempts to generate instructions that most efficiently carry out the instructions of the original source code. The efficiency of the execution of code is dependent at least in part on how effective the hardware can execute the machine instructions. One of the biggest bottlenecks to efficient performance is main memory accesses because the right data is not in cache memory at the right time. Processes will inevitably bog down if the process is waiting for data from main memory. Placing data into cache will improve performance, but only if the cache can hold the data that is needed and the proper data is there at the proper time.
Among the strategies for compiling computer programs to improve performance are blocking strategies for loop based code sections. Program loops can involve a great number of memory accesses and thus involve a great deal of overhead. In particular, if a nested loop requests data from a large array, it may not be possible to fit all of the data elements into the cache or provide for the right logistics to achieve this, thereby slowing processing. Blocking generally involves dividing a loop's iteration into parts or blocks, with an additional outer loop, or blocking loop, generated to drive the original loop for each part. The use of blocking allows an array of data to be divided into blocks or windows of data for processing, thereby reducing the amount of data required for each iteration within a block. In such a situation, if data are used more than once and if a block of data fits within the cache, then cache exploitation is likely to be improved.
However, conventional blocking loop strategies are limited in scope. If a nested loop varies from normal indexing schemes, then conventional blocking strategies will not work because they cannot maintain semantical correctness and thus would produce false numerical answers. As a result, conventional techniques are not applicable to improve the performance of nested loops.