This disclosure relates generally to the field of data processing, and more particularly to the field of dynamic programming.
Dynamic programming is used for solving complex problems by breaking them down into simpler sub-problems. It is applicable to problems that exhibit the properties of overlapping sub-problems. The dynamic programming approach seeks to solve each sub-problem only once, thus reducing the total number of computations. Some examples of dynamic programming algorithms include the Smith Waterman algorithm and the Needleman-Wunsch algorithms. However, the complexity and the size of the data sets to be processed using dynamic programming may exceed the available hardware resources.
Some approaches to solve dynamic programming algorithms may be grouped into three categories. The first category includes central processing unit (CPU) based solutions, which may be relatively easy to implement, because a plurality of high-level programming languages such as C, C#, and Java are available for creating software applications for solving the dynamic programming algorithm. However, executing such an application on a general-purpose CPU may be relatively slow because the instruction set of the general-purpose CPU is not specially adapted for the particular requirements of dynamic processing algorithms. A second category of solutions referred as field-programmable gate array (FPGA) solutions, a special kind of processing unit is used for solving dynamic programming algorithms. The special processing units are operable to execute dynamic programming algorithms faster than general-purpose CPUs. However, only low-level programming languages may exist for the processing units, thereby making the creation and adaptation of dynamic programming algorithms highly time-consuming and difficult task. A third approach may be based on executing dynamic programming algorithms on a graphics processing unit (GPU), or a plurality of GPUs operating in parallel.
A GPU is a multiprocessor computing device capable of executing a plurality of threads in parallel. A GPU is specialized for computationally intensive, highly parallel computation, and may be used for graphics rendering or other highly parallelizable computation tasks. The GPU may act as a coprocessor to the main CPU in a computing system, thereby allowing off-loading data-parallel, compute-intensive portions of applications running on the main CPU onto the GPU. A processing unit of the GPU may include a stream multiprocessor. GPUs have several memory units which may have different functions. Some of these memory units may be used as a shared memory of the GPU, herein also referred to as local memory, as said memory is accessible (shared) by a group of threads running on the GPU. A thread as used herein refers to a thread of execution in a program function, and a group of threads or thread block as used herein refers to a batch of threads that can cooperate together by effectively sharing data through some shared memory, preferentially a fast memory, and that can synchronize their execution to coordinate memory access. In addition, a GPU may comprise one or more global memories that are accessible by each of the stream multiprocessors; however accessing the global memories may have slower performance as compared to accessing the shared memory.