Proteomics is a term used to describe the large-scale study of proteins. Proteins provide a key functional element in biological behavior, however, their exact role is still a matter of research. One popular method of studying proteins is through the comparative study of protein peptides with similar amino acid sequences. In comparative statistical studies, peptides are typically numerically characterized, such as with the aid of a Mass Spectrometer, which provides a digital signature for each peptide. The numerical characterizations of different peptides may then be clustered utilizing a statistical clustering technique, such as Unweighted Pair Group Method with Arithmetic Mean (UPGMA). Peptides whose numerical characterizations are similar may be grouped together in the same cluster. These clusters may then be used to identify the peptides.
In the method shown in FIG. 1, each numerical characterization of a peptide is initially considered to be a cluster having a single member. Given a distance function, a distance matrix D may then be constructed indicating the distances between each pair of clusters. For a given distance matrix D, where dij is the distance between item i and item j, the following iterative procedure is performed:                1. Find dmin=min(dij); If more than one dij are equal to dmin, select one of them, typically the min(i,j).        2. If dmin is greater than a predefined threshold, such as 0.15 when the distance is normalized between 0 and 1, then stop.        3. Create a new cluster which is the union of clusters i and j.        4. Remove cluster j, and replace cluster i with the new cluster.        5. If D contains only one cluster then stop the iterative procedure.        6. Update the distance entries in D that are affected by the creation of the new cluster.        7. Go to Step 1.        
Unfortunately, the process of determining the minimum item in a matrix is computationally expensive and typically requires on the order of O(N2) operations, where D is a symmetric matrix of size N×N. Given the vast numbers of proteins yet to be studied, a method for preparing peptide spectra for identification that requires fewer operations than existing techniques would therefore be advantageous.