1. Field of the Invention
The present invention relates to digital image processing and more specifically to the problem of instructing a computer or other automated machine how to identify discrete objects in a visual field composed of a large array of pixels.
2. Description of the Prior Art
Visual imagery is often presented in digitized form as a large array of pixels, with each pixel providing information about physical characteristics of a point on a spatial grid. The pixel information could indicate, by way of example, how much light is reflected from a grid point on an illuminated surface, what the chemical composition of the point is or some other distinguishing attribute associated with the point.
A problem with such pixel arrays is that major features of the visual image tend to become obscured by the process of breaking up the whole into minute parts. The forest is lost, so to speak, when one's focus is restricted to looking at a small number of trees. Computers have this problem when they analyze only a small number of pixels at a time in a large pixel array. It has been believed that the best way to recognize major features of the undigitized original image is by displaying the entire pixel array on a viewing screen in an orderly fashion using different colors, tones of gray, or even simple black and white to represent one or more of the characteristics of each point, and using the human eye to identify and categorize the features by reintegrating the plural pixels into a contiguous whole. The process by which such feature identification and categorization takes place is not fully understood. It would be advantageous if computers could be taught to replicate the process and recognize major features in a digitized image with at least the speed of the human eye. Analysis of visual imagery could be automated and this would facilitate image study which would be either too complex or too tedious for humans.
Some steps have been taken towards teaching computers how to recognize features in an image. A definition is first formulated for instructing a computer how to distinguish between a pixel that belongs to a major feature and a pixel that does not belong to any feature at all. Neighboring pixels which share a common trait, as for example a reflected light intensity equal to or less than a predetermined threshold, can be defined as belonging to a common entity which will be termed an "object." In a Cartesian type of array, two pixels can be defined to neighbor one another and thus belong to a common object if they are directly adjacent to one another either vertically, horizontally or diagonally. A pixel which has no immediate neighbors with similar traits can be considered an object by itself. Background portions of a visual image may be considered non-objects and ignored.
The human eye can identify and simultaneously categorize a respectable number of discrete objects in a visual field with simplicity and speed (usually a few seconds) just by "looking." Previous attempts to replicate this function in automated machinery have generally failed to either match or reduce the recognition time of the human eye, particularly when the digitized image presented is filled with a large number of complex objects. Some analysis methods require hours or even weeks to sort out discrete objects in fields that take the human eye merely one or two seconds. One reason for such long processing times will be explained with reference to FIGS. 1 and 2 of the accompanying drawings.
FIG. 1 illustrates a digitized visual field 10 containing a number of relatively noncomplex, discrete features denoted as objects a, b, c and d. The purpose of FIG. 1 is merely to give an example of what a digitized visual field might look like. The original visual field (prior to digitization) is indicated by dashed outlines. It might have been a plan view of a printed circuit board having a plurality of maze-like conductive paths and perhaps certain defects, or it might have been a telescopic view of geophysical features as seen from a space satellite, or it might again have been a small portion of a semiconductive substrate as seen through a scanning electron microscope. The original nature of the illustrated objects is not important for this part of the discussion. The problem here is to determine by means of automated machinery whether each of the pixels belongs to one or another of a plurality of objects (denoted by dots and dashed outlines in FIG. 1). For the sake of generality objects will be generically identified by object names, each being denoted by an underlined small letter: a, b, c and so forth. Discrete objects will be defined as being composed of one or more pixels sharing some predetermined trait ("attribute") that is distinguishable from a "background" trait. When an object has more than one pixel, each pixel of the object needs to be (under the definition chosen above) directly adjacent to at least one other pixel having the same trait. The presence of the predetermined trait at a grid point will be termed an "occurrence." Areas in the visual field where the trait does not occur will be defined as regions of non-occurrence or simply as background.
In real world applications, a visual field may be broken up and digitized into hundreds or thousands if not millions of pixels. Each of the pixels conveys information only about its respective point on a spatial grid. Even though it is possible to construct a computer that can determine in perhaps ten microseconds or less whether any one occurrence neighbors another, the time for processing a large number of data points in a visual field, particularly when those data points may need to be scanned repeatedly in a recursive manner, can be quite staggering (e.g. on the order of hours, weeks or even longer). More specifically, if an array has N pixels and the method for processing the N pixels requires recursion, the time for such processing will grow roughly as a function of N-factorial for a given increase of N. (N-factorial=N.times.[N-1].times.[N-2]. . . .times.[3].times.[2].times.[1].)
There is shown in FIG. 2 a small array 10' of pixels including occurrences belonging to a W-shaped object. The small array 10' is obviously smaller and much simpler than the array 10 of FIG. 1. The latter is relatively noncomplex in comparison to image analysis situations found in every day life (the so called "real world"). The processing time required for complex image analysis can be appreciated by extrapolating from the soon-to-be-described problem of the simple array 10' to the more-complex arrays found in real world situations using the N-factorial based yardstick given above. As N grows into hundreds and thousands of pixels, the associated processing time can leap from hours to weeks, months and even years. Such long periods for obtaining analysis results are unacceptable in most settings.
One reason the human eye can detect the singularity of the W-shaped object shown in array 10' readily and in short time is because the eye can view the entire image at once. A computer is often limited to looking at the pixels of an array through a window that is much smaller than the entire array. The task of recognizing that the scattered occurrences, as viewed through a small window, all belong to one object presents a problem.
Horizontal rows of the array 10' will be denoted by R1-R8. Vertical columns of the array will be denoted by C1-C9. The array 10' is shown split open between rows R3 and R4 in order to emphasize a point and enhance understanding of how a computer might perceive the array when the computer is performing either a top to bottom scan or a bottom to top scan. Since arrays are normally scanned first left to right by individual rows and then top to bottom by stepping from the processing of one row to the processing of a second row below, this standard scanning method will be followed in the below explanation.
The scanning is to be performed by an automated scanning machine. FIG. 2 shows, in addition to the array 10', a block diagram of an object identifying apparatus 20 comprised of an automated scanning machine 22 and a data storage memory 24. The memory 24 is structured to support a single object assignment table 24a. Partitioning among the system components 10', 22 and 24 is done merely for sake of example. The scanning machine 22 could be a contiguous portion of a general purpose computer. It could, on the other hand, be some sort of test instrument capable of repeatedly scanning a physical specimen in response to computer instructions. The array 10' could represent a set of data stored in array-like fashion in a memory of a general purpose computer or it could be a depiction of actual sample points on the surface of a tangible specimen. The problem to be described is generic to various types of system configurations.
As the computer or other image scanning machine 22 begins to scan across row R1 of array 10', it will encounter a first occurrence (e.g., a pixel of a particularly high intensity) at column C2. The occurrence is denoted by the asterisk "*" at row R1, column C2. The computer can assume that this first occurrence (R1,C2) belongs to a first object, a. The computer records this assumption at a first addressed location (ADDR 1) of memory 24.
Scanning further along the first row R1, the computer encounters a second occurrence (again denoted by an asterisk) at column C5. Since the occurrence at column C5 is not directly adjacent to the occurrence at column C2, the computer must initially assume that this second occurrence at column C5 belongs to a second object b which is distinct from the first object a. It records this assumption at a second location (ADDR 2) in the memory 24. Further along row R1 the computer detects yet another isolated occurrence at column C8 and records the same as belonging to a third object c in a third location (ADDR 3) of its memory 24.
In scanning row R2, the computer will find a fourth occurrence at column C2 which is directly adjacent to the previously encountered occurrence at column C2 of row R1. It is assumed here that the computer 22 has a finite scan window 22a having a height of just two rows, and that the computer can identify the object names a, b, c, etc., assigned to occurrences in an upper row of the window 22a as well as object names assigned to already processed occurrences in a currently scanned lower row (also referred to as the focal row of the window). Since an "object" is defined as a continuum of directly adjacent occurrences (asterisks), the computer can safely conclude that the fourth encountered occurrence, at row R2, column C2, belongs to the same object a as that of the first occurrence directly above it and the computer will record this conclusion in memory 24.
At this point, a table 24a has been built in the memory 24 designating the first and fourth encountered occurrences (R1,C1 and R2,C2) as belonging to object a, the second encountered occurrence (R1,C5) as belonging to object, b, and the third encountered occurrence (R1,C8) as belonging to object, c. This procedure is repeated and the designation table 24a in memory 24 consequently becomes larger and larger while still continuing to erroneously identify occurrences in column C2 as belonging to a first object, a, occurrences in column C5 as belonging to a distinct second object, b, and occurrences in column C8 as belonging to a third distinct object, c. It should be appreciated from the split shown between rows R3 and R4 that the change of pattern shown in rows R4-R8, where two previously unconnected branches of the W-shape meet, could conceivably occur thousands of rows below rows R1-R3 rather than just immediately below and that the designation table 24a of memory 24 could have grown to include a very large number of erroneous entries before the change of pattern in row R5 is detected.
When the computer hits row R5, it will encounter a dilemma. The occurrence at row R5, column C3, can safely be designated as belonging to the a object because it is directly adjacent to the occurrence in the row above at R4,C2 which had previously been designated as belonging to object a. But when the computer next focuses on the occurrence in row R5, column C4, it will find one occurrence (asterisk at R5,C3) belonging to object a directly to the left of the focal point and another occurrence (asterisk at R4,C5) diagonally adjacent to the focal point, belonging to the second object b. The computer will now realize that it has made a mistake and that all the occurrences (asterisks) linking down from above the scan window 22a to the occurrence of R4,C5 need to be redesignated as belonging to object a rather than object b. But the computer cannot simply look at all the upper rows R1-R4 in one glance, find the connected occurrences and correct their object designations. The scanning window 22a is limited to detecting just two rows at a time. The computer has to recursively shift its scanning window backwards row by row to rescan rows R1 through R4 and further has to track backwards through memory 24 to change all the b designations of table 24a to a designations.
The task would be fairly simple if the computer somehow knew, as we know, that all the occurrences connected to the focal point R5,C4 are arranged in a straight vertical line. But this is not the case. The string of occurrences rooted to and emanating upwardly from the focal point R5, C4 could conceivably branch off along an infinite number of continuous patterns other than the simple one used in our example. The illustrated W-shaped object could, for instance, be a small part of a large meandering surface crack that has hundreds of randomly arranged branches and sub-branches. It would be inappropriate for the computer to make any assumption about the nature of the pattern. Recursion is the only solution known for correcting the designation table 24a.
The need for recursion stems from what can be called a "saddle point" or U-shaped connection where one branch of an object meets for the first time during a scan with another branch of the object. While only two saddle points occur in the W-shaped object of FIG. 2, it should be appreciated that saddle points can appear in large numbers and in all sorts of ways when fields far more complex than the one of FIG. 2 are considered. It is possible to construct zig-zag patterns extending laterally that have saddle points at every other column of a scanned row. This type of zig-zagging commonly occurs in the real world when objects having ill defined edges are digitized.
One way to reduce recursion time would be to record the location of all once-encountered occurrences so the computer doesn't have to hunt for them again while back tracking through memory table 24a. But it would be undesirable to have the location coordinates of each and every occurrence (*) of a large array (e.g., FIG. 1) stored in the computer's memory 24 because such storage would consume massive amounts of memory, replicate the function of the large array 10' and make the computer's use of limited resources inefficient. As such, the computer is preferably instructed to process all previous entries of table 24a (rows R4 back to R1 in our example) in a recursive manner and the scanning machine 22 is simultaneously instructed to rescan every previously scanned pixel of the array 10' in order to correctly identify the improperly assigned occurrence designations of memory 24 and flip those entries in memory table 24a from b's to a's. This means that the scanning machine 22 can not be a single scan type of device.
Needless to say, the automated scanning machine 22 of our example (FIG. 2) will again need to rescan rows R6 back through R1 when the occurrence at row R7, column C7, is investigated and found to be a saddle point linking the branch that was thought to be object c to a branch of object a. Those skilled in the art will appreciate that the processing time for investigating N points in a visual array will grow at a rate which is approximately a function of N-factorial when such recursion is required. This represents an unacceptably large processing time for many applications.