The present application relates generally to an improved data processing apparatus and method and more specifically to an apparatus and method for performing all-to-all comparisons on data processing system architectures having explicit memory accesses and limited storage space.
An All-to-All comparison operation is defined as an operation in which every pair of data elements are compared for an output result. For example, given 3 data elements S1, S2, and S3, an All-to-All comparison operation comprises executing one or more comparison algorithms for performing a comparison between data elements S1 and S2, a comparison between data elements S1 and S3, and finally a comparison between data elements S2 and S3, such that every pair of data elements is compared. As a general rule, given n data elements, n*(n−1)/2 comparisons need to be performed using All-to-All comparison operation.
The data elements subject to such an All-to-All comparison operation may be defined as data sequences, data values (e.g., numbers), or any other data format. Such All-to-All comparison operations are commonly used in many different types of workloads including bioinformatics, image processing, etc. The algorithm used to perform such All-to-All comparison operations varies depending upon the particular workload. For example, in bioinformatics, the comparisons are global or local alignments in which Deoxyribonucleic Acid (DNA) or protein sequences are compared to output an integer result which denotes how closely the sequences are related to each other.
One algorithm used to perform All-to-All comparisons is the Jaccard algorithm which generates the Jaccard coefficient. The Jaccard coefficient, or Jaccard index as it is also called, is a correlation coefficient for determining the similarity between two binary strings, or bit vectors. Mathematically, given two equal length bit vectors x and y, with entries indexed from 0 to n, the Jaccard index computes:Jaccard Coefficient (Index)=c/(a+b+c)where c is the AND product and (a+b) is the XOR product of 2 vectors x and y. The Jaccard coefficient is used to compute the similarity of objects that are described by feature vectors, where each entry in the bit vector corresponds to the presence or absence of a particular feature. The Jaccard coefficient finds its applications in a wide variety of areas including drug design, similarity searching on the Internet, financial applications, and social network analysis.