A main contribution to slowed performance in array-intensive embedded applications and multimedia applications is the large time disparity between processor cycles and main memory access. Most execution time associated with such applications is spent inside the applications' loops. Conventional tools that provide loop optimizations utilize loop restructuring and array regrouping. The scope of these tools is limited to a particular loop and the tools do not significantly reduce execution time. Thus, there exists a need to overcome at least one of the preceding deficiencies and limitations of the related art.