1. Field
The present disclosure relates generally to improving computer processor performance and more particularly to a sequence alignment method of a vector processor.
2. Description of the Related Art
The alignment of a sequence (e.g., the sorting of the elements of the sequence) in descending-order or ascending-order is a basic function used in many systems. Performing alignment via a “fast sort” sorting algorithm with a complexity of O(nlog (n)) such as quick sort includes manipulating individual elements and is thus difficult to apply to (e.g., implement by) vector processors using a single instruction multiple data (SIMD) or single instruction multiple thread (SIMT) architecture.
Thus, most computing systems that include one or more vector processors and/or one or more multicores may use “merge sort” sorting algorithms to implement alignment of a sequence (“sequence alignment”). Since merge sort is relatively fast and efficient for data that is already sorted, computing systems that use merge sort may be required to quickly sort grouped elements before merging them. In general, the complexity of merge sort using a binary tree structure, i.e., O(Nlog2N), is proportional to the size of a data set, i.e., N.
FIG. 1 is a view for explaining a typical merge sort method that may be performed by a computing system that includes one or more vector processors and/or one or more multicores. FIG. 1 illustrates a method of sorting (“aligning”) a sequence of eight elements (N=8) in ascending order.
Referring to FIG. 1, the typical merge sort method divides a sequence into N elements first. A conventional scalar processor may be required to perform a separate division process in relation to the merge sort method, but a vector processor does not necessarily need to perform such a division process because in a vector processor, elements of a sequence are connected by a vector structure. Accordingly, a division process may be completed, by a vector processor, simply by loading a sequence of N elements.
Thereafter, each pair of adjacent elements in the sequence, among the N elements divided from the sequence, are sorted (“aligned”). This step is referred to as a merge step, particularly, a “conquer” step of a merge step. For example, as shown in FIG. 1, adjacent elements “8” and “3” may be sorted as “38” because 3 is smaller than 8, and adjacent elements “2” and “9” may be sorted as “29” because 2 is smaller than 9. Since each of the N elements of the sequence needs to be put into a buffer and called, N calls can be achieved. That is, the complexity of the sorting may be a maximum of O(N).
Thereafter, each pair of adjacent sorted subsequences in the sequence may be combined. This step may also be a part of the merge step. For example, a pair of adjacent subsequences “38” and “29” may be combined as “2389”, and a pair of adjacent subsequences “17” and “45” may be combined as “1457”.
The combining of the subsequences “38” and “29” involves comparing 3 with each of “2” and “9” and comparing “8” with “9” and thus requires three calls. The combination of the subsequences “17” and “45” involves comparing “1” with each of “4” and “5” and comparing “7” with each of “4” and “5” and thus requires four calls. That is, since in a worst-case scenario, the comparison of each pair of adjacent subsequences requires a maximum of four calls and a maximum of eight calls in total. That is, the complexity of the combining may become O(N).
Finally, in order to combine “2839” and “1457”, “2” may be compared with each of “1”, “4”, “5”, and “7”, “3” may be compared with each of “4”, “5”, and “7”, and “8” may be compared with each of “4”, “5”, and “7”, but not necessarily with “9”. If the sequence is varied, a maximum of N comparisons may be needed. Even in which case, the complexity may still be O(N). Accordingly, a maximum of N calls are needed in each step of the typical merge sort method, and the complexity in each step of the typical merge sort method may be O(N).
The division of a sequence with N elements into two halves may be performed log2(N) times to obtain N subsequences, each subsequence containing one element, and the combining of N subsequences may be performed log2(N) times to obtain a whole aligned sequence. Thus, in a worst-case scenario, N*log2(N) calls are needed, which means that the total complexity of the merge sort method, including the merge sort method shown in FIG. 1, may become O(N*log2(N)). Such complexity of an alignment method performed by a vector processor may represent a suboptimal usage of resources (e.g., processing capacity, memory capacity, power supply, etc.) and a suboptimal operating speed of a computing system that includes the vector processor. Accordingly, operating efficiency (“computer performance”) of the computing system using the merge sort method may be at a suboptimal state.