This invention relates to the field of scientific computation and in particular, to techniques of using parallel computational hardware such as graphical processing units in scientific computation.
In scientific computations, such as physics and image processing, two-dimensional (2D) matrix representations are commonly implemented as lookup tables (LUTs). The main motivation with such approach is to be able to calculate a finite set of values beforehand and store in the memory to avoid real-time computations.
With the above approach, the computational burden is replaced with increased communication with the memory. In the cases where these tables cannot fit into fast-speed memories such as L1 cache, the frequent access to these tables significantly slow down the computational speed.
Therefore an improved approach is needed.