In recent years, many-core computing has been applied not only to high-performance computing, but also to mobile devices and personal computers. The many-core computing technology represented by a graphics processing unit (GPU) has also been applied everywhere. However, use of a heterogeneous many-core processor to accelerate an existing program remains a challenging task. Because looping in a serial program usually consumes a relatively large proportion of execution time, loop parallelization is a solution that uses a many-core accelerator to accelerate an existing program. In the loop parallelization solution, a polyhedron model is a powerful scheme covering loop analysis, loop transformation, and a many-core processor, and polyhedron-model-based compilers also emerge endlessly. However, the polyhedron model allows to merely analyze loops of an affine function in an iteration domain and a data domain, and a large proportion of non-affine dynamic loops in the iteration domain or the data domain still exist in a variety of existing programs. Because such loops may have indefinite data dependency during compilation, the loops are difficult to parallelize by using the polyhedral model or another static approach.
However, determining inter-iteration data dependency during running may consume a lot of extra time and space. As a result, a relatively good acceleration ratio cannot be obtained when an entire loop is executed on the GPU, and even the entire loop cannot be executed on the GPU with a scarce memory due to excessive space required. Therefore, designing a light-weight runtime inter-iteration dependency detection technology is a key to parallelize loops with indefinite data dependency during compilation.