The visual signs for which radiologists search during mammographic screenings for breast cancer have been codified into three basic categories. Namely, circumscribed lesions, microcalcifications and stellate lesions. Of these, stellate lesions are arguably the most important due to the fact that most breast carcinomas are first indicated by stellate lesions and that they are so often malignant that there is only one rare case where they do not immediately require biopsy. Stellate lesions are also the most subtle and varied in appearance. Of the three classes, they are most difficult to detect in that they are often indicated only by subtle architectural distortions.
The problem of detection of these radiographic signs (in fact, the problem of pattern recognition in images in general) has usually been divided into two parts. The first part is the question of individual features, the image characteristics one uses as low level clues as to the presence or absence of the pattern in question. The second is the information calculus, namely the means by which the clues are assembled into a decision concerning the existence of a pattern.
A common approach in previous detection systems was to put most of the effort into the low-level detection of abnormalities, and to use a series of heuristics to make the final decision. Some heuristics were procedural, implemented by various thresholds and tests in the code implementing the basic algorithm, or by the subjective insertion of decision boundaries on statistical information extracted from image data.
Traditionally, a difficulty with heuristic methods are that they lack robustness as the pool of cases to be classified increases. Accordingly, prior systems put a fair amount of effort into the application of statistical methods to this problem with the techniques ranging from parametric conditional probability classification, to K nearest neighbors, to a non-parametric quadratic classifier. All of these investigations, while somewhat successful, labored under some constraints. The parametric approaches necessarily made assumptions which had not actually held (Gaussian conditional probability model or the dependence of low level features) and the non-parametric work had been restricted to the classification of already detected lesions or large sub-images of the mammogram.
Although not utilized in prior stellate lesion detection systems, other pattern classification systems, for detection of military targets for example, utilized binary decision trees. Bayesian hypothesis detection was, and still is, the optimal way to perform pattern recognition from features of images. However, Bayesian approaches require extensive knowledge about the probability distribution functions of each of the features. This data is usually found to be unavailable in practical problems and is certainly unavailable in the case of mammographic screening. Thus, such approaches were never utilized in detecting stellate lesions in digitized mammographic data.
Binary decision tree (BDT) classification methods, however, provide a means of approximating the optimal Bayesian classification role for a given situation and result in decision trees such as that shown in FIG. 1. The decision tree included a plurality of nodes 1, for example, shown in FIG. 1, including terminal nodes 2, for example, shown in FIG. 1. At each node, one of the features in a vector was compared to a threshold which moved the vector down the appropriate branch of the tree. This continued until it arrived at a terminal node which was assigned a classification. FIG. 1 is an example of a simple known BDT which is merely shown for exemplary purposes. It should be noted, however, that in known BDTs, a practical tree often contained hundreds of nodes.
The control parameters of each node of the known BDTs were chosen by simply determining the feature and threshold which best separated the current data, where the quality of separation was determined by some impurity measure. This process was then repeated, recursively partitioning the remaining training samples, until some stopping criteria was met. This recursive selection of the best possible partition was, and still is, one of the advantages of the BDT approach, namely its capacity for automatic feature selection data reduction.
An example of the training phase of a known BDT will now be described with reference to FIGS. 2. Initially, in the first step, 4 of FIG. 2, image data of reference images are obtained. By utilizing reference images, a binary decision tree could be grown utilizing data for which the truth was already known.
The second step, 6 of FIG. 2, was to determine regions of interest for each of the plurality of reference images. This was a type of crude guess done on each reference image done by a crude algorithm.
In the third step, 8 of FIG. 2, features for each of the region of interest windows would then be calculated. However, if the region of interest windows obtained were incorrect, the data was lost forever.
Finally, a binary decision tree was calculated, including intermediate and terminal nodes. A Yes/No classification was assigned to each of the terminal nodes. Each terminal node was assigned a classification of either containing the desired behavior or not. To produce each Yes/No classification at each terminal node, a terminal node population mix was used. If, when growing the tree, case A came up forty times, case B twenty-five times, and case C ten times, then case A would be selected for the terminal node. This completed the known training phase for developing a binary decision tree.
Application of the known binary decision tree worked similar to the training phase. The various factors in the binary decision tree could then be manipulated to decide optimal vectors which would be terminal nodes. Each terminal node was classified as either Yes or No, either being assigned a binary one or zero, for example.
Problems of known binary decision trees included that of the potential loss of statistical significance. In the known binary decision trees, it was important to maintain a binary decision tree as large as possible, and to prune the BDT in order to maintain statistical significance.
A further problem with known binary decision trees was that there was not always enough data to ensure accuracy. If a feature or object desired to be detected in a digital image was very small within the image data and only a random sample of data (one out of every thirty, for example) was obtained from region of interest windows used in creating the binary decision tree, the chance of creating a faulty BDT became more probable.