The present invention relates to the field of computer aided diagnosis of abnormal lesions in medical images. In particular, the invention relates to a fast algorithm for detecting spiculated or stellar lesions in a digital mammogram to assist in the detection of malignant breast cancer tumors at an early stage in their development.
Breast cancer in women is a serious health problem, the American Cancer Society currently estimating that over 180,000 U.S. women are diagnosed with breast cancer each year. Breast cancer is the second major cause of cancer death among women, the American Cancer Society also estimating that breast cancer causes the death of over 44,000 U.S. women each year. While at present there is no means for preventing breast cancer, early detection of the disease prolongs life expectancy and decreases the likelihood of the need for a total mastectomy. Mammography using x-rays is currently the most common method of detecting and analyzing breast lesions.
The detection of spiculated, or stellar-shaped, lesions (xe2x80x9cspiculationsxe2x80x9d) in mammograms is of particular importance because a spiculated breast tumor has a relatively high probability of being malignant. While it is important to detect the spiculated lesions as early as possible, i.e. when they are as small as possible, practical considerations can make this difficult. In particular, a typical mammogram may contain myriads of lines corresponding to fibrous breast tissue, and the trained, focused eye of a radiologist is needed to detect small spiculated lesions among these lines. Moreover, a typical radiologist may be required to examine hundreds of mammograms per day, leading to the possibility of a missed diagnosis due to human error.
Accordingly, the need has arisen for a computer-assisted diagnosis (CAD) system for assisting in the detection of abnormal lesions, including spiculations, in medical images. The desired CAD system digitizes x-ray mammograms to produce a digital mammogram, and performs numerical image processing algorithms on the digital mammogram. The output of the CAD system is a highlighted display which directs the attention of the radiologist to suspicious portions of the x-ray mammogram. The desired characteristics of a spiculation-detecting CAD system are high speed (requiring less processing time), high precision (the ability to detect subtle spiculations), and high accuracy (the ability to avoid false positives and missed spiculations). It may also be desired that the spiculation-detecting CAD system also be used as a mass-detecting and mass-classifying CAD system, and that the CAD system be capable of using spiculation information in conjunction with mass information for identifying suspicious masses in the digital mammogram and directing the attention of the radiologist to both the spiculations and the suspicious masses.
One method for detecting spiculations in digital mammograms, proposed by Kegelmeyer et al and referred to as the xe2x80x9cAlignment of Local Oriented Edgesxe2x80x9d (ALOE) algorithm, is described in Kegelmeyer, xe2x80x9cComputer-aided Mammographic Screening for Spiculated Lesions,xe2x80x9d Radiology 191:331-337 (1994). The ALOE method first calculates local gradients in a digitized mammogram. For each xe2x80x9ccandidate pointxe2x80x9d in the image, a predetermined window around that point is selected, the window size being some fraction of the overall image size. An xe2x80x9cALOE signalxe2x80x9d for each candidate point then is calculated based on information in the surrounding window, the ALOE signal being defined as the standard deviation of a histogram of the gradient directions of all pixels in the window. The next candidate point, offset from the previous candidate point by a distance corresponding to the desired resolution of the search, is then considered.
Keeping in mind that a spiculation is a roughly symmetric set of lines radiating from a central point or region, a histogram of gradient directions will tend to be a flat distribution from 0 to 360 degrees if a spiculated region is centered around the candidate point. Thus, because the ALOE signal is the standard deviation of the histogram, the ALOE signal will be lower for those candidate points which are at the centers of spiculations, and will be higher for those candidate points which are not at the centers of spiculations. After the ALOE signal is calculated for all candidate points in the image, local minima in a plot of the ALOE signals are used as a basis for identifying spiculations.
The ALOE algorithm has several disadvantages. The primary disadvantage is that, in addition to spiculations, many unwanted background objects can also produce a small ALOE signal. For example, a point that is surrounded by a circle, such as the border of a circumscribed mass, also produces gradients in all directions, and therefore will produce a local minimum ALOE signal. A false positive may result. Furthermore, a typical spiculation in an actual mammogram will not have lines radiating in every direction, but rather will have lines radiating in several discrete directions in rough symmetry about the center. Thus, because every direction may not be present in the histogram of gradient angles around the center of a spiculation, the standard deviation of the histogram may still be quite large, resulting in a larger ALOE signal. This spiculation may then be missed. Thus, the ALOE algorithm represents serious practical problems because it may yield false positives and may also miss certain spiculations.
The ALOE algorithm is representative of a class of xe2x80x9cbackward directionxe2x80x9d spiculation detection algorithms. By xe2x80x9cbackward directionxe2x80x9d it is meant that a xe2x80x9ccandidate pointxe2x80x9d is incrementally moved across the image by a distance corresponding to the desired resolution of the spiculation search. At each candidate point, a set of xe2x80x9cwindow computationsxe2x80x9d for a window of pixels surrounding the candidate point is performed, and a metric corresponding to the presence and/or strength of a spiculation centered on the candidate point is computed. Thus, for example, the ALOE algorithm computes the xe2x80x9cALOE signalxe2x80x9d for each candidate point, and then moves on to the next candidate point.
As a general observation, xe2x80x9cbackward directionxe2x80x9d algorithms are computationally intensive. This is because, for an image size of Nxc3x97N, there will generally need to be on the order of K(bN)2 computations, where K is the number of window computations for each candidate point, and where b is the reciprocal of the number of image pixels between each candidate point. Because the number K is often proportional to the square or cube of the window size, the computational intensity of xe2x80x9cbackward directionxe2x80x9d approaches can easily get out of hand.
A second method for detecting spiculations in digital mammograms, proposed by Karssemeijer et al., is described in Karssemeijer, xe2x80x9cRecognition of Stellate Lesions in Digital Mammograms,xe2x80x9d Digital Mammography: Proceedings of the 2nd International Workshop on Digital Mammography, York, England, Jul. 10-12 1994 (Elsevier Science 1994), and xe2x80x9cDetection of Stellate Distortions in Mammograms using Scale Space Operators,xe2x80x9d Information Processing in Medical Imaging (Bizais et al., eds., Kluwer Academic Publishers 1995). Like the ALOE algorithm, the Karssemeijer approach is also a xe2x80x9cbackward directionxe2x80x9d spiculation detection algorithm.
In the Karssemeijer algorithm, a xe2x80x9cline imagexe2x80x9d and a xe2x80x9cdirection imagexe2x80x9d is first formed from the digital mammogram. As is known in the art, a line image contains line information for each pixel in the digital mammogram, while a direction image contains direction information for each pixel in the line image. The most basic form of line image, used in the Karssemeijer algorithm, contains line information which is a xe2x80x9c1xe2x80x9d if the pixel is located along a line and a xe2x80x9c0xe2x80x9d otherwise. The most basic form of direction image, also used in the Karssemeijer algorithm, contains direction information which, for those pixels having a xe2x80x9c1xe2x80x9d in the line image, equals the approximate angle of a tangent to the line passing through the pixel.
Consistent with its xe2x80x9cbackward directionxe2x80x9d character, the Karssemeijer algorithm then considers a window of pixels, chosen to be an annulus, around a candidate point in the line image, and then computes a metric associated with that candidate point. This procedure is repeated for each candidate point. The metric for each candidate point is calculated by counting the number of pixels in the annular window which are contained along lines which point approximately to the center of the window. Local maxima in a plot of the metrics are used to identify spiculations in the image.
The Karssemeijer algorithm has several disadvantages. The primary disadvantage is computational intensity due to the xe2x80x9cbackward directionxe2x80x9d character of the algorithm. Because each line image pixel actually appears in many sequential windows corresponding to successive candidate points, the calculations are repeated for each pixel many times, and the algorithm is very time consuming. As discussed previously, speed is a key factor in a CAD system for assisting a radiologist. A CAD device which slows down the radiologist, who must often view and analyze hundreds of mammograms per day, is undesirable.
Further, the approach used by Karssemeijer in the detection of lines for generating the line image is based on using Gabor filters in the frequency domain and performing a Fast Fourier Transform (FFT) on the mammogram image. The image pixels are then multiplied by the transformed Gabor kernel elements, and the inverse FFT is then used to obtain the enhanced line image in the spatial domain. This approach requires input images which have dimensions that are a power of two, which is required by the FFT. Thus, after digitizing the mammogram, the Karssemeijer approach requires the digitized image to be xe2x80x9cpadded outxe2x80x9d to the nearest power of two. The cost of this approach is higher memory requirements in the computer and a larger computation time.
Accordingly, it is an object of the present invention to provide a fast computer-assisted diagnosis (CAD) system for assisting in the identification of spiculated lesions in digital mammograms, the CAD system being capable of producing an output which directs attention to spiculated lesions in the x-ray mammogram for increasing the speed and accuracy of x-ray mammogram analysis.
It is a further object of the present invention to provide a fast CAD system for detecting spiculated lesions which produces fewer false positives and fewer missed spiculations, while also being capable of detecting smaller spiculations.
It is still a further object of the present invention to provide a fast CAD system which detects spiculated lesions in a manner fast enough to permit use of the CAD system in a clinical radiology environment.
It is still a further object of the present invention to provide a fast spiculation-detecting CAD system which is also capable of detecting and classifying masses in a digital mammogram, the CAD system being capable of using spiculation information in conjunction with mass information for identifying suspicious masses in the digital mammogram.
These and other objects of the present invention are provided for by an improved CAD system for detecting spiculated lesions in a digital mammogram image using a xe2x80x9cforward directionxe2x80x9d detection algorithm, as opposed to a xe2x80x9cbackward directionxe2x80x9d detection algorithm, for improving the speed, accuracy, and precision of results. A CAD system according to the present invention employs a fast method for detecting spiculations in the digital mammogram image, the method including the steps of determining a region of potential intersection for a plurality of pixels using line information and direction information related to the pixel, accumulating the regions of potential intersection to produce a cumulative array, and using information derived from the cumulative array, such as the positions and strengths of local maxima in the cumulative array, for identifying the spiculations in the digital mammogram image. In one embodiment of the invention, the region of potential intersection for every pixel in the digital mammogram image is determined and accumulated into the cumulative array.
The line information and direction information are obtained by generating a line image and a direction image corresponding to the digital mammogram image. The region of potential intersection corresponding to a pixel is found by determining, according to the line information related to the pixel, whether the pixel is located along a line, and if the pixel is located along a line, selecting a region centered on the pixel corresponding to a predetermined pattern, the predetermined pattern being rotated by an amount related to the direction information related to the pixel. In another embodiment of the invention, the amount by which the predetermined pattern is rotated is equal to the direction information for that pixel. In another embodiment of the invention, the predetermined pattern is a split rectangle or trapezoid centered on the pixel, the rectangle or trapezoid having a large aspect ratio. The spiculations are identified by using information from the cumulative array formed by an accumulation of the regions of potential intersection for the pixels in the image.
Advantageously, a given pixel in the digital mammogram image is considered only once in the process of developing the cumulative array. In this sense, a CAD system according to the present invention operates in the xe2x80x9cforward direction,xe2x80x9d and is very fast when compared to the xe2x80x9cbackward directionxe2x80x9d algorithms presented previously, which use image pixels multiple times in deriving spiculation metrics. Thus, the method used in a CAD system according to the present invention is very fast. Moreover, the method is highly amenable to hardware implementation using parallel processors, thus increasing the speed of the CAD system even further.
In another embodiment of the present invention, the cumulative array comprises fewer pixels than the digital mammogram image. For example, where the digital mammogram image is Mxc3x97N pixels, the cumulative array may be 0.25Mxc3x970.25N pixels. Using line information and direction information related to each digital mammogram pixel, a region of potential intersection is determined for each pixel with respect to the smaller cumulative array and proportionally located in a smaller 0.25Mxc3x970.25N space. The regions of potential intersection are then accumulated into the 0.25Mxc3x970.25N cumulative array. Because there are addition operations taking place for fewer cumulative array pixels, the algorithm is made significantly faster without a significant loss in resolution.
In another embodiment of the invention, the CAD system performs the step of computing line information and direction information for each image pixel in the digital mammogram, followed by the step of computing a weighting function WT(theta) based on statistical information taken among direction information for all image pixels. The direction information for each digital mammogram image pixel having coordinates (i,j) is an angle THETA(i,j), and the weighting function WT(theta) is equal to WT(THETA(i,j)) for that pixel. For each digital mammogram pixel, a region of potential intersection is determined and accumulated into the cumulative array after being weighted by WT(THETA(i,j)) for that pixel. The weighting function WT(theta) is computed by calculating a histogram function H(theta) of the direction information THETA(i,j) for all image pixels, followed by the step of developing the function WT(theta) as having an inverse relationship to the histogram function H(theta). In this manner, lines perpendicular to a predominant line direction in the digital mammogram image are emphasized, whereas lines parallel to the predominant line direction in the digital mammogram are de-emphasized, thus increasing system precision and accuracy.
In another embodiment of the present invention, mass information corresponding to the digital mammogram image, including mass location information, is computed in addition to the cumulative array. Information in the cumulative array is used in conjunction with the mass information for identifying regions of interest in the digital mammogram image, such as by using a linear classifier method based on mass information and cumulative array metrics.
In another embodiment of the invention, local attention is given to the cumulative array near locations having a strong circumscribed mass candidate. The cumulative array is thresholded by a first threshold value in a first region not including the strong circumscribed mass candidate location, whereas the cumulative array is thresholded by a second value less than the first value in a second region which includes said strong circumscribed mass candidate. In this manner, spiculations which otherwise would have fallen below a threshold value in the cumulative array are detected when associated with a strong circumscribed mass candidate, for assigning a value of spiculatedness to said circumscribed mass candidate in mass detection and classification.
In another embodiment of the invention, mass information for the digital mammogram image is computed by using information in a sphericity array. The sphericity array is calculated by the steps of computing a gradient plane from the digital mammogram image, the gradient plane having pixels, each gradient plane pixel having a gradient intensity value and a gradient direction value, determining a region of potential centroid for each gradient plane pixel using the gradient intensity value and gradient direction value for that pixel, and accumulating the regions of potential centroid to produce a sphericity array. In this way, strong circumscribed mass candidates may be detected by using information derived from an algorithm which is a forward direction algorithm similar to the forward direction algorithm for detecting spiculations.
In another embodiment of the present invention, the fast CAD system is capable of locating noneccentric spiculations for increased precision, accuracy, and reduction of false positives. The fast CAD system according to this embodiment performs a method comprising the steps of determining a region of potential intersection for each of a plurality of image pixels using line information and direction information related to that image pixel, computing a plurality of weights corresponding to each of the plurality of image pixels, accumulating for each of the plurality of image pixels a plurality of weights into a plurality of accumulation planes for those pixels located within the region of potential intersection for that image pixel, and processing information contained in said plurality of accumulation planes for identifying the noneccentric spiculations in the image. The plurality of accumulation planes comprises a first accumulation plane ACC1, a second accumulation plane ACC2, and a third accumulation plane ACC3, the first, second, and third accumulation planes ACC1, ACC2, and ACC3 being processed for producing a spiculation activity plane ACT and a spiculation eccentricity plane ECC for use in locating noneccentric spiculations.
The spiculation activity plane ACT and the spiculation eccentricity plane ECC are computed using information in the plurality of accumulation planes such that the spiculation activity plane ACT comprises pixel values related to the presence of spiculations, and such that said spiculation eccentricity plane ECC comprises pixel values related to the presence of eccentric spiculations. A spiculation output plane SO is formed by setting, for each pixel (i,j), SO(i,j) equal to a first constant multiplied by ACT(i,j) added to a second constant multiplied by ECC(i,j), the first constant typically being a positive number and the second constant usually being about xe2x88x920.5 times the first constant. In this manner, the spiculation output plane SO(i,j) will contain high values near locations having spiculations, and will contain maxima among these high values corresponding to spiculations which are less eccentric and more radially symmetric. In this manner, false positives are reduced and accuracy and precision are increased.
In another embodiment of the invention, mass information corresponding to the digital mammogram image is computed, the mass information including mass events, each event comprising mass centroid location, mass area, mass elongation, and mass contrast. Information contained in the SO, ACT, and ECC arrays is used in conjunction with the mass information in a linear classifier method for identifying regions of interest in the image.