Stencil computations in general are a class of problems characterized by near-neighbor calculations on data structures (also referred to as arrays), that are iterated over several major steps. For example, a sequential iterator (such as a time-domain loop) or a sequence may include one or more statements. In each loop iteration or in each step in the sequence, for each statement, one or more elements of a data structure are accessed, and the respective values of these elements are used to compute a partial or a final result. The result may be stored in the same or a different data structure or array. The values of one or more elements that are used in a current iteration/sequence step are typically computed in a previous iteration/sequence step.
Moreover, the data structure elements that are accessed in an iteration/sequence step may be neighbors, forming a contiguous block of the data structure. The elements can also be near neighbors, where the access is strided. To illustrate, in a data structure Z[100][1000], the elements Z[25][10] . . . Z[25][14] are neighbors and the elements Z[60][0]; Z[60][1]; Z[60][3]; Z[60][6]; and Z[60][10] can be considered near neighbors, with a stride that monotonically increases by one. Another example of near neighbors includes the elements Z[10][2]; Z[15][2]; Z[20][2]; Z[25][2]; and Z[30][2]. Stencils are often used in a variety of applications including linear system solvers, finite-difference time domain simulations, convolutional neural networks, factor graphs (and closely related generalized distributive law problems), reverse time migration seismic data imaging techniques, and various image-processing techniques including blurring, denoising, segmentation, and image registration.
Stencils have been studied extensively in recent academic literature. General polyhedral compilation methods have been shown to be effective for stencils optimization. Some optimizations for stencils focus on both parallelism and loop blocking techniques. Loop blocking techniques can increase data locality by performing multiple iterations of the stencil on a small tile of the arrays sized to fit in a in high speed cache memory (e.g., on a CPU), and/or in a scratchpad memory (e.g., on GPU). Some techniques employ data layout and vectorization optimizations for increased performance.
An order of a stencil can be the number of points that extend beyond the write point in a given dimension and direction. For example, in a stencil a[i][j]=b[i+1][j]+b[i−2][j]+b[i][j+3]+b[i][j−4], the order is 1 in the positive direction of dimension i, 2 in the negative direction of dimension i, 3 in the positive direction of dimension j, and 4 in the negative direction of dimension j. In some cases, a first order stencil may use values in a data structure that were computed in one previous iteration or during the computation of one previous step in a sequence of steps. A second-order stencil may use values that were computed in two previous iterations or sequence steps, etc. In general, the higher the stencil order the more the number of data elements accessed during computation of that stencil. The shape of a stencil in general may be described by the set of numbers of elements accessed in each dimension of a data structure.
Some techniques appear to describe taking advantage of associative and commutative properties of stencil and convolution algorithms for common subexpression elimination, reduction of register pressure and reduction of communication overhead, and some publications appear to describe optimization of higher-order stencils. Recently published techniques show that the computation of high-order stencils (e.g., second, third, and higher-order stencils) can be accelerated using certain compile-time transformations that may decrease register pressure and communication volume.
The difficulty of hand coding high-performance stencil codes has driven the development of a number of domain-specific languages that allow programmer specification of a stencil computation with automatic generation of optimized code. While such programming languages facilitate the specification of stencils and stencil-based computations, many commonly used stencils are not high-order stencils. In general, it is difficult, if not infeasible, to generate a high-order stencil of an optimized size and/or shape that is necessary to achieve maximum or substantial benefit from various known optimizations that can be applied to high-order stencils.