Digital imagery and other multidimensional digital arrays of intensity are routinely collected using digital sensors and arrays of charge coupled devices (CCDs). The resulting data arrays are analyzed to determine patterns and detect features in the data. For example, color images of a battle scene are analyzed to detect targets, and radiographs and sonograms of human and animal bodies are examined to detect tumors and other indications of injury or disease. As the number and complexity of these digital data arrays to be analyzed increase or the time required to perform the analyses decreases, automated and machine assisted analysis becomes more critical. Some statistically based automated procedures for detecting features in a multidimensional array are adequate when the feature encompasses many points in the array, i.e. when the feature is large, but fail to perform well as the feature to be detected becomes small. Some procedures perform well when tuned to a particular problem through experimental adjustment of many parameters, but such tuning may place an undue burden on time limited or experience limited personnel. Typical problems encountered with such automated analysis of small structures in multidimensional arrays are illustrated for the case of automatic detection of microcalcification candidates in mammograms.
Breast cancer has the highest incidence among all cancer types in American women, causing 1 woman in 8 to develop the disease in her lifetime. Every year, about 182,000 new cases are diagnosed with breast cancer and about 46,000 women die of this disease. The 5-year survival for women with breast cancer improves significantly with early diagnosis and treatment. To enable early detection, the American Cancer Society (ACS) recommends a baseline mammogram for all women by the age of 40, a mammogram approximately every other year between the ages of 40 and 50, and a mammogram every year after the age of 50. It is possible that the volume of mammography will become one of the highest among clinical X-ray procedures since more than 30 million women in the U.S. are above the age of 50 and 41% are known to follow the ACS guidelines.
Besides the volume problem, an additional difficulty of early detection of breast cancer in mammograms is the subtlety of the early signal. A microcalcification cluster, an early sign of breast cancer that may warrant biopsy, is commonly defined as three or more microcalcifications present in 1 cm2 on a mammogram. These clusters are often difficult to detect due to their small size and their similarity to other tissue structures. The width of an individual microcalcification is less than 2 mm. The etiology of microcalcifications includes lobular, ductal or epithelial hyperplasia, secretion of calcium salts by epithelial cells, adenosis, as well as calcification of necrotic debris due to carcinoma. Up to 50% of breast cancer cases exhibit microcalcification clusters, and 20–35% of clusters in the absence of a mass are related to malignant growth. In many cases a cluster is the first and only sign that allows timely intervention.
The increasing pressure to interpret large numbers of mammograms and the subtlety of many early signs increase the likelihood of missing breast cancer. A reliable automated system that indicates suspicious structures in mammograms can allow the radiologist to focus rapidly on the relevant parts of the mammogram and it can increase the effectiveness and efficiency of radiology clinics. In the detection of breast cancer, false negatives may cause a delay in the diagnosis and treatment of the disease while false positives cause unwarranted biopsy examinations. Therefore, both sensitivity and specificity need to be maximized, with a relatively higher priority on sensitivity, which has a more vital role.
A common approach used for detecting microcalcifications in mammograms starts by segmenting candidate structures and subsequently applying feature extraction and pattern recognition to distinguish microcalcifications from background tissue among the candidates. In this process, segmentation plays an essential role since the quantitative features that represent each candidate structure, such as size, contrast, and sharpness, depend on the region indicated by segmentation. Furthermore, to process all possible candidate structures, a considerably large number of background structures need to be segmented, making fast segmentation desirable.
Several techniques for segmentation have been applied to microcalcifications. One segmentation technique is based on local thresholding for individual pixels using the mean pixel value and root mean square (rms) noise fluctuation in a selected region around the thresholded pixel. The threshold for a pixel is set as the mean value plus the rms noise value multiplied by a selected coefficient. A structure is segmented by connecting pixels that exceed the threshold. Both parameters that have to be selected, size of region and threshold coefficient, are critical to this method. If a microcalcification is close to another microcalcification or bright structure, the window used to compute the rms noise value around the first microcalcification will include the other bright structures, and the noise rms may be overestimated, thus setting the threshold too high. On the other hand, if the selected region is too small, it will not contain sufficient background pixels when placed on large microcalcifications.
Such a window size needs to be selected in a second segmentation algorithm as well, where local thresholding is used by setting a threshold for small square sub images. The threshold is based on an expected bimodal intensity distribution in a window of selected size that contains the sub-image to be segmented. If the distribution is not bimodal, then the threshold is set by using 5 different positions of the window each containing the sub-image to be segmented. The existence of a bimodal distribution in at least one window is essential for this algorithm.
Other segmentation methods start with seed pixels and grow a region by adding pixels. They also require selection of a window size and threshold parameters. The localized implementation of region growing depends on the selected window size and the threshold for absolute difference in gray level between the seed pixel and a pixel to be added to the region.
One segmentation algorithm uses several steps that include high-pass filtering, difference of Gaussinan filtering, four computations of the standard deviation of the image, a smoothing, an opening, as well as an iterative thickening process with two erosions, two intersections and a union operation in each iteration. More than ten parameters have to be selected, including widths of Gaussian distributions, threshold coefficients, and diameters of morphological filtering elements.
A segmentation algorithm that operates without parametric distribution models, local statistics windows, or manually adjustable thresholds is desirable.
A segmentation method that is fast is also important. Up to 400 films per day are routinely screened in busy radiology clinics. The automated analysis does not have to be applied on-line; however, it may be difficult to process large numbers of mammograms overnight if algorithms are not fast enough. Because the segmentation algorithm has to segment all candidate structures that may potentially be microcalcifications, its speed is especially relevant. Each film may have several thousand candidate structures that must be segmented.
The multi-tolerance segmentation algorithm of Shen et al. (L. Shen, et al. “Detection and Classifications of Mammographic Calcifications,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 7, pp. 1403–1416, 1993), does not use statistical models for local statistics, and its threshold is set automatically. This multi-tolerance, region growing approach uses a growth tolerance parameter that changes in a small range with a step size that depends on the seed pixel. The structure of interest is segmented multiple times with varying tolerance parameters, and in each segmentation, a set of three features is computed. The normalized vector differences in the feature set between successive segmentations are calculated and the segmentation with minimal difference is selected as the final one.
The active contours model of Kass et al. (Kass, M. et al. “Snakes: Active Contour Models,” International Journal on Computer Vision, pp. 321–331, 1988), also provides segmentation without parametric statistical data models or windows for local statistics, but does rely on several user selected parameters that place some burden on the user. It has been used successfully to determine the boundaries of tissue structures in data such as ultrasound and MRI images of the heart, and MRI images of the brain, but it has not been applied to the segmentation of microcalcifications. The active contours model starts with an initial contour placed near the expected boundary and moves the contour iteratively toward the boundary by minimizing an energy function. The contour is modeled as a physical flexible object with elasticity and rigidity properties. Its dynamics, dictated by the balance between these internal properties and external forces that depend on the image data, satisfy the Euler equations and minimize the corresponding energy function. An active contour that is initiated as a closed curve remains so during iterations and its smoothness can be adjusted by the choice of parameters.
What is needed is a segmentation method and apparatus without statistical models, local statistics, or thresholds to be selected manually, and with significantly lower computational complexity compared to the multi-tolerance and active contours methods, for enhanced speed.
In particular, what is needed is a method and apparatus to segment pixels in an image, such as a mammogram, containing a plurality of extra dark or extra bright objects just a few pixels in extent, that gives edges similar to those selected by an expert, but does so with fewer computations and with fewer manually adjustable parameters than conventional segmentation methods and equipment.