The present invention is an improvement in computational processing for determining percolation and cluster distribution. An introduction to percolation and cluster theory is presented by Stauffer and Aharony, Introduction to Percolation Theory (Revised 2d ed. 1994). A "cluster" may be thought of in one sense as a group of features that neighbor one another within a universe, e.g. a lattice. A lattice is a special spatial graph where all the nodes in the graph obey translational symmetry. This means that every node in the lattice has the same surrounding as every other node. In lattice terminology, vertices are known as sites. As an example of a lattice, consider a large array of squares: a given square in the middle of the array has four "nearest neighbors" which each share one side in common with the given square. If a feature is present at a given site and present also at a nearest neighbor site, the two sites are said to be in a cluster. The cluster definition does not have to be restricted to nearest neighbors only. It could include next nearest and other neighbors, also. A cluster may extend throughout the lattice, and it can extend in many dimensions corresponding to the dimensionality of the array. Clusters described in the present invention are also known as connected components and should not be confused with feature space clustering.
Computers lend themselves to cluster analysis. Also, computer-based simulations are often preferable to live experimentation. For example, cluster theory may be useful in determining the consequences of a tree that begins to burn in a forest. If the question is whether a single burning tree will lead to the destruction of the entire forest, and at what rate, depending on how densely packed the other trees are, it would be better from a societal perspective to perform the analysis via a computer simulation rather than by physical experimentation calling for the possible consumption of many forests. Hence, computer analysis is favored. However, as a practical matter, however, computers almost never have sufficient memory space to store the data for an entire lattice or array of sites. It is generally not practical to attempt to load all sites (data points) into memory and analyze the entire universe of data at once.
The Basic Bounded HK ("BBHK") Algorithm
In 1976, my Hoshen-Kopelman (HK) algorithm was published to allow the simulation and study of large lattices without having to store the entire lattice in computer memory. J. Hoshen and R. Kopelman, "Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm," Physical Review B, Vol. 14, No. 8 (Oct. 15, 1976), which is incorporated by reference. The "HK Algorithm" presented there is frequently cited in literature and used in industry. E.g., Constintin, Berry, and Vanderzanden, "Parallelization of the Hoshen-Kopelman Algorithm Using a Finite State Machine," J. of Supercomputer Applications and High Performance Computing (submitted in 1996).
The original HK Algorithm is denoted herein as the Basic Bounded HK (BBHK) algorithm because it was designed to perform cluster analysis on bounded, fixed size lattices in two (2-D) and three (3-D) dimensions. It can also be used for higher dimension lattices and general graphs. The BBHK algorithm was designed to analyze clusters comprising of one type of site. Such types are also referred in the literature as classes. If the BBHK algorithm has to analyze M classes, where M is greater than one, it can do so by performing M passes through the lattice. Pixel images can be viewed as 2-D lattices. Voxels (volume elements) are three dimensional (3-D) lattices. The BBHK algorithm examines data from a bounded lattice in two or three dimensions. For example, a given lattice may include different types of sites A, B, C, etc. which could represent anything--a tree or no tree, an oil particle or no oil, gray level data or white level data, color A or color B or color C, a topographic feature, etc. The BBHK algorithm would provide a distribution of clusters by cluster size for one class only. Two sites would be found to belong to the same cluster if they are the same class and are neighbors.
Comprehensive descriptions and implementations of the BBHK algorithm are given by Stauffer and Aharony, Introduction to Percolation Theory, supra, and R. J. Gaylord, P. R. Wellin, Computer Simulations with Mathematica. (Springer-Verlag, 1995).
The following is a simple example of how the BBHK algorithm works. Consider the pixel data of FIG. 1(a), to be analyzed. It will be appreciated that this collection of data represents an array 100A that has been obtained illustratively by scanning an image area using any type of scanning device, including but not limited to a television camera, optical data digitizer, self-scanned array, arrangement of light emitting diodes, or the like, or in other fashions, and obtaining gray level data. This could come from a pattern recognition system. However, the pixels or array elements need not have come from visually-discernable features. Rather, any features suffice--be they magnetic, gravitational, electrical, nuclear, optical, or otherwise. Illustratively, data for the entire six by six array has been stored and is available for examination.
In FIG. 1(a), white and gray pixels are represented. The gray pixels 102 are represented with the letter "G." The white pixels 104 are left blank. The gray pixels will be analyzed for clusters using the BBHK algorithm.
In this example, the BBHK algorithm examines each row of FIG. 1(a) from left to right, and processes the rows sequentially from top to bottom. A site or pixel position is described by (T, L), where T and L denote the position of a pixel by row and column, starting with the top left comer of the lattice. Thus, the top left comer pixel is denoted by (1, 1). The next position to the right is (1, 2) and the position below (1, 1) is (2, 1). In FIGS. 1(a)-1(c), the algorithm considers no more than the four nearest neighbors for cluster definition.
The BBHK processing starts by examining pixel (1,1). The processor using this algorithm will assign labels as shown in FIG. 1(b), which illustrates another array 100B. In general, each distinct cluster (or cluster fragment) will be assigned a label, and the number of pixels within each cluster is to be accumulated as respective population counts. Whether a cluster fragment is deemed a cluster is determined at a later time based on information not yet available. Hence, a cluster "fragment" is a temporary designation. Thus, because pixel (1,1) is gray, the algorithm assigns a label to it--cluster "1" as shown in FIG. 1(b) at a position 106b corresponding to position (1,1) of FIG. 1(a). Also, it increments a software or hardware population counter (not shown) that has been (or will now be) assigned or set up for cluster number 1. Let the population of pixels in cluster number 1 be represented by N(1). Thus, after one pixel N(1)=1.
Proceeding to pixel (1,2), the algorithm finds that this pixel includes the feature of interest (gray level data) and now seeks to establish whether this new site is part of any prior cluster fragment or is a new cluster fragment. By examining the pixel (1,1) to the left, the algorithm determines that pixel (1,2) belongs to cluster fragment 1. Therefore, it labels that second pixel with a "1" (see FIG. 1(b) and increments counter N(1) so that now N(1)=2. The pixel at (1,2) is thus part of the same cluster fragment that includes pixel (1,1).
Continuing to move right within the top row, pixels (1,3) and (1,4) are white value pixels and are skipped; only gray pixels are analyzed for clusters in this example. The next gray pixel is at (1,5). Because pixel (1,4) to its left is a white pixel, the algorithm at this juncture determines that (for now) pixel (1,5) appears to be part of a new cluster fragment. Hence, as shown at 108b, the processor assigns a new label "2" to pixel (1,5) and increments a second counter N(2) to keep track of the number of pixels in cluster fragment number 2. Thus, at this time, N(2)=1. The next pixel is at (1, 6). Because it also is gray, and because pixel (1,5) is gray, the algorithm determines that this sixth pixel belongs to the same fragment cluster number 2 and labels this pixel as "2" and sets N(2)=2. Thus, the top row is completed. Note that no comparisons have been made to any preceding row because this is a border condition and there is no preceding row to the first row. Hence, at the end of scanning the data at the top row, two cluster "fragments" have been identified. It is improper to call them clusters yet, because subsequent processing might show them to be part of the same cluster (and in this case, they are).
The algorithm now proceeds to the second row and skips pixel (2,1) because it is a white pixel. The first gray pixel is at location (2,2). The algorithm now looks to the nearest neighbors above and to the left of pixel (2,2). The pixel immediately above it is (1,2) and it is gray. The pixel to the left is white. Accordingly, the algorithm determines that gray pixel (2,2) belongs to the same cluster fragment as pixel (1,2). Since pixel (1,2) was labeled as part of cluster fragment 1, the algorithm assigns the label "1" to pixel (2,2) also. It now increments the N(1) counter to indicate that there are three pixels in cluster fragment number 1, setting N(1)=3.
The next gray pixel along the second row is at position (2,4). The algorithm looks at the pixels above and to the left and finds that both pixels are white. Hence, the algorithm concludes that this new bit of gray data belongs to a new cluster fragment. Accordingly, as shown at 110b, the processor assigns a new cluster fragment label "3" to this pixel and increments a third counter for cluster fragment number 3, setting N(3)=1.
The next gray pixel 112b is immediately to the right at location (2,5). The algorithm compares it to the pixels above and to the left. In this instance, the pixel above was determined to belong to cluster fragment number 2, as shown at 108b, and the pixel to the left was labeled as belonging to cluster fragment number 3, as shown at 110b. This next pixel (2,5) apparently belongs to "both" cluster fragments. However, it is now seen that "both" cluster fragments are really part of one single cluster ("they" have contiguous edges). By convention, preference is given to the smaller number as the label, and pixel (2,5) is thus labeled as belonging to cluster fragment number 2. The counter N(2) for cluster fragment number 2 is incremented. The gray pixel 110b at position (2,4) is not relabeled from "3" to "2". Cluster fragment number 3 had only one count so that single count also is added to the count for cluster fragment number 2. Hence, at this point, N(2)=4. Before proceeding much further, because the BBHK algorithm has now determined that cluster fragment 3 belongs with cluster fragment 2, a record must be made so that if another label "3" is later encountered, the site should be regarded as part of cluster fragment 2. To do this, a negative number is loaded into the population counter for cluster fragment 3. In this instance, the algorithm sets N(3)=-2, where the -2 denotes that label 3 points to label 2.
The last pixel in row 2 is at location (2,6) and has gray data. Both its neighbor above and its neighbor to the left have been labeled as belonging to cluster fragment 2, and accordingly this pixel also is labeled as belonging to cluster fragment 2. The population counter for cluster fragment number 2 is again incremented, and now N(2)=5.
The examination by the processor using the BBHK algorithm continues until the entire six by six lattice has been examined. A new cluster fragment is apparently encountered at position (3,1) and is assigned label 4. However, the next gray pixel is at position (3,2) and is found to be adjacent to pixels that have been labeled as belonging to cluster fragment number 1 and cluster fragment number 4. Accordingly, by the convention mentioned above, pixel (3,2) is labeled as belonging to cluster fragment number 1, as shown at 114b. The number count from cluster fragment number 4 is added to the number count (population count) for cluster fragment number 1 so that N(1)=5. Also, the algorithm adjusts the contents of the fourth counter so that N(4)=-1, indicating that cluster fragment 4 points to cluster fragment 1. Pixel (3,3) is gray and adjacent to a cluster fragment number 1 site, so it is labeled as belonging to cluster 1.
Pixel (3, 4) is gray and is found to be adjacent to sites of cluster fragment number 1 and cluster fragment number 3. By convention, this pixel is assigned to cluster fragment number 1, and the population count for cluster fragment number 3 is now added to cluster fragment number 1. However, the record at this moment indicates that the population count for cluster fragment number 3 equals minus two (N(3)=-2). This indicates that cluster fragments 3 and 2 have previously merged. Hence, the population count for cluster fragment number 2 (which includes the count for cluster fragment 3) is now added to the population count for cluster fragment number 1. Moreover, giving preference to the lower number as a general rule, the population count for cluster fragment number 3 is reset to equal (-1) and the population count for cluster fragment number 2 is also set to equal (-1). At the end of the third row, it will be seen that although four apparently distinct cluster fragments were encountered, it has now been determined that they all belong to a single cluster. Further, the population of this larger cluster, labeled as number 1, includes 12 pixels. Thus, N(1)=12. It will be seen in FIG. 1(a) that there are no other gray pixels that have an adjacent edge to any pixel that belongs to cluster number 1. During the scanning of the fourth, fifth, and sixth rows, (apparently) new cluster fragments 5, 6, and 7 are encountered and labeled. Hence, array 100B shows labels 1 through 7.
At the end of the examination of the entire six by six lattice 100A, the BBHK algorithm has loaded data into seven (population) counters N(k): N(1)=12, N(2)=-1, N(3)=-1, N(4)=-1, N(5)=10, N(6)=-5, and N(7)=-5. The counters that have positive numbers correspond to cluster (fragment) numbers 1 and 5, and they provide the cluster numbers for the lattice. This six by six array has been found to have just two clusters, and their sizes (populations) are 12 and 10. Labels 1 and 5 are denoted as the "proper" labels of the clusters because they carry the cluster population number information. The other labels are direct or indirect pointers to the proper labels. The fact that a counter fails to give a "proper" label is indicated by the fact that a negative number is stored in the counter. The magnitude of that number is the label being pointed to (which may identify a cluster).
The prior art BBHK algorithm also describes that labels can be reused. In the previous example, the computer memory needed to store the information on all 36 pixels, i.e., all six rows. While this is feasible for small lattices, it becomes much more unwieldy and exhausts the computer memory capabilities when extremely large lattices need to be analyzed. Thus, by the BBHK algorithm technique of reusing or recycling labels, much less memory is required. This technique is illustrated with regard to FIG. 1(c), which shows an array 100C. In this variation, the computer needs to store only two rows, each of size six, at a single time and reuse the labels. For this example, the processor can assign {1, 2, 3} as the label set for the odd lines and the labels {4, 5, 6} for the even lines, realizing that because each row has only six elements, the maximum number of labels that could be needed for any row would result from the alternating pattern 0-1-0-1-0-1, calling for at most only three labels per row. Using the labeling recycling, intermediate cluster results are collected whenever a row is completed. The cluster tally results are taken for the row just prior to the completed row.
The first row in FIG. 1(c) is labeled exactly as the top row in FIG. 1(b). However, the second row is labeled by the label set {4, 5, 6}. In this instance, all clusters found in row 1 will next be found to belong to clusters in row 2. Accordingly, all the labels from row 1 will be updated to point from row 1 to the labels of row 2. Hence, the processor sets N(1)=-4 and accumulates the total number of pixels for cluster number 4, i.e., N(4)=3. Likewise, the computer sets N(2)=-5 and N(5)=5. Now, when the processor finishes with row 2, it can forget row 1 because all the labels from row 1 point to labels in row 2, and it will never encounter row 1 again. Additionally, note that none of the clusters of row 1 are complete because the population counts (N) are all negative and point to row 2.
The third row will reuse labels {1, 2, 3}. The algorithm now sets N(4)=-1, N(5)=1, and N(1)=12. Again, clusters of row 2 are found to be incomplete because the population counts are all negative.
Proceeding to row 4, the two gray pixels will be assigned the (recycled) label for cluster number 4, and N(4)=2. Now, inspecting the labels of row 3, the algorithm notes that the population count for the sole cluster is a positive integer, i.e. N(1)=12, which implies that this cluster number 1 is complete and does not extend to row number 4. Hence, it can be tallied.
The algorithm follows the same procedure for row 5, recycling again the labels from the label set {1, 2, 3}. At the end of row 5, the algorithm has set population counters as follows: N(4)=-2, N(2)=5, and N(1)=1. Note that the cluster label "1" is used again in row 5, even though it was used in rows 1 and 3.
For the sixth row, the algorithm assigns cluster labels from the label set {4, 5, 6}. In this instance, at the end of row 6, the algorithm has determined N(4)=10, N(1)=-4, and N(2)=-4. Hence, there are no complete clusters for row 5. Further, since row 6 is the last row, the processor knows that N(4) is complete. Hence, using the approach of label recycling, in the six by six lattice, the processor using this algorithm identified and tallied two clusters having populations 12 and 10--exactly the same result as obtained with reference to FIG. 1(b). Note that array 100C used only four labels {1, 2, 4, 5,} and that labels 3 and 6 were not required.
The BBHK algorithm has found application in many areas. One technology is X-ray microscopic tomography, in which a sample is divided into vertical cuts. In each plane, the sample is scanned from several angles. For each angle, the fraction of X-ray energy that is absorbed is measured. Using the multiple absorption data, the absorptivity coefficient of each voxel in the cut is determined. The variation in absorptivity between voxels creates the image. To create discrete classes for voxels, the absorptivity is partitioned into ranges, where each range corresponds to a different class. See, e.g. Kinney et al., "The X-ray Tomographic Microscope: Three Dimensional Perspectives Of Evolving Microstructures," Nuclear Instruments and Methods in Physics Research A 347 (North-Holland 1994) pp. 480-486; Kinney et al., "In Vivo, Three Dimensional Microscopy Of Trabecular Bone," J. of Bone and Mineral Research, Vol. 10, No. 2 (Blackwell Science 1995) pp. 264-270; and King et al., "X-ray Tomographic Microscopy Investigation Of The Ductile Rupture Of An Aluminum Foil Bonded Between Sapphire Blocks," Scripta Metallurgica et Materialia, Vol. 33, No. 12 (Elsevier Science/Acta Metallurgica 1995) pp. 1941-1946; all of which are incorporated by reference.
The BBHK algorithm enumerated cluster populations in a bounded lattice or array. While proven to have wide-ranging applications, the prior algorithm fundamentally determined cluster population only. In the 20 years since the HK algorithm was introduced, it has not been extended for the calculations of interest in shape analysis.
Moreover, the BBHK algorithm was inherently constrained by size and was not able to analyze data that is unbounded in at least one dimension.
Furthermore, the BBHK algorithm cannot handle more than one class in a single pass through the lattice. The present invention addresses such issues.