Image segmentation (also known as “autosegmentation”, “contouring”, and “autocontouring”) is a technique of processing a digital image to detect, classify, and enumerate discrete objects depicted therein. Image segmentation presents a difficult problem because digital images generally lack sufficient information to constrain the possible solutions of the segmentation problem to a small set of solutions that includes the correct solution.
Image segmentation finds a popular application in the field of medical images, particularly computed tomography (CT) images, x-ray images, magnetic resonance (MR) images, and the like. It is highly desirable to accurately contour various anatomic objects (such as the prostate, kidneys, the liver, the pancreas, etc.) that are shown in such medical images. By accurately determining the boundary of such anatomic objects, the location of the anatomic object relative to its surroundings can be used to plan and execute medical procedures such as radiotherapy treatment for cancer. Because it is often the case that the healthy tissues and organs surrounding cancerous tissue are highly sensitive to radiation, it is extremely important that the exact location of the anatomic object to be treated be known. That is, accurate placement, shaping, and intensity modulation of external X-ray beams for tumor irradiation (teletherapy) depends absolutely on detailed 3-D maps of the patient's anatomy, as do other forms of radiotherapy treatment such as brachytherapy and radioimmunotherapy.
Treatment planning for medical procedures such as teletherapy and brachytherapy begins by creating a detailed 3-D map of the patient's anatomy (including tumors). These maps are often created as a series of contours drawn on CT slices. In teletherapy procedures, once the map is obtained, a set of external X-ray beam angles, beam cross-section shapes, and beam intensities can be designed to deliver a prescribed dose of radiation to the tumor while minimizing the harm done to nearby healthy tissues. For brachytherapy procedures, once the map is obtained, the appropriate location(s) to deliver radioactive seeds can be determined, also while minimizing the harm done to nearby healthy tissues.
Image segmentation operates on medical images in their digital form. A digital image of a target such as the human body is a data set comprising an array of data elements, each data element having a numerical data value corresponding to a property of the target, wherein the property is measured by an image sensor at regular intervals throughout the image sensor's field of view. The property to which the data values correspond may be the light intensity of black and white photography, the separate RGB components of a color image, or other similar properties. Typically the image data set is an array of pixels, wherein each pixel has a value corresponding to intensity. The leftmost image shown in FIG. 1 is a CT digital image of an interior portion of a patient's body. As can be seen, depicted in the image is the left kidney along with nearby anatomical objects such the spleen, the vertebral body, the spinal cord, and the ribs. The usefulness of digital images derives partly from the capacity of digital images to be transformed and enhanced by computer programs so that meaning can be extracted therefrom.
The data value of each data element can be modeled as a random variable. A random variable is a measured quantity whose repeated observations cluster about some central value. The “mean” is the most likely value and the “mode” is the most-often observed value. Random variables are most completely described by a histogram, resealed as a probability distribution. Typically, the data values of the data elements of the image data set constitute a random variable X which may take its values from a finite set of possible data values, such as the interval −1024 to 3071, typical for CT. In mathematical notation, the data value of any given data element of the image data set may be equal to any xi, wherein xi is a member of the set {x1, x2, . . . xn}, and wherein “n” is 4096 for the CT example above. Random variables whose values are limited to members of finite sets are termed discrete; random variables which can take any real value are continuous random variables. The probability that a particular data element takes the value xi for random variable X, is expressed as p(xi)=Pr{X=xi}. Pr{ } is the probability that the expression in the braces { } is true. The full set of probabilities, written p(X), is the probability distribution. As is well-known, probabilities and their distribution have the following properties:                p(xi)≧0;        0≦p(xi)≦1; and        Σni=1p(xi)=1        
To deal with 2 or more random variables at once, the concepts of joint probability and conditional probability are used. For two discrete random variables (X and Y), the joint distribution is the set of joint probabilities p(x,y)=Pr{x=x, Y=y} for all possible values of x and y. A histogram corresponding to such a joint distribution can be represented by a two-dimensional plot with the x entries along the rows and the y entries along the columns.
The conditional probability of observing any X given that Y=y is written as p(X|Y=y), which corresponds to the column of histogram entries for Y=y. The conditional and joint probabilities are related by the definition:
                              p          ⁡                      (                          X              |              Y                        )                          =                              p            ⁡                          (                              X                ,                Y                            )                                            p            ⁡                          (              Y              )                                                          (        1        )            The conditional probability of Y, given X, is defined as:
                              p          ⁡                      (                          X              |              Y                        )                          =                              p            ⁡                          (                              X                ,                Y                            )                                            p            ⁡                          (              X              )                                                          (        2        )            Referring to FIG. 2, it can be seen that the relative sizes of the peaks shown in the histogram correspond to the probabilities that a pixel selected randomly will belong to one of the regions in the image. In the CT image shown in FIG. 2, two regions are easily discernible; interior region 100, which corresponds to the anatomical portion of the image, and exterior region 102, which corresponds to the non-anatomical portions of the image (the black background and the CT scanner couch and gantry ring depicted at the bottom of the image). The conditional probability (or likelihood) that a pixel with value xi is in a particular region k, is p(xi|regionk).
If the individual intensity probabilities in a region k are summed:
                                          ∑                          i              =              1                        n                    ⁢                                          ⁢                      p            ⁡                          (                                                x                  i                                |                                  region                  k                                            )                                      =                  p          ⁡                      (                          region              k                        )                                              (        3        )            wherein p(regionk) is the probability of hitting region k when a pixel is randomly selected from the image. If there are R regions in the image, the sum of probabilities over all regions is
                                          ∑                          k              =              1                        R                    ⁢                                          ⁢                      p            ⁡                          (                              region                k                            )                                      =        1                            (        4        )            because the regions cover the entire image area. Thus, the total probability over all regions and all pixel values is:
                                          ∑                          k              =              1                        R                    ⁢                                    ∑                              i                =                1                            n                        ⁢                                                  ⁢                          p              ⁢                              (                                                      x                    i                                    |                                      region                    k                                                  )                                                    =        1                            (        5        )            
These conditional probability expressions can be used to make decisions as to which pixels belong to which regions. For example, if two regions, region, and region2, are equally probable, then a pixel with value xi most likely belongs to the region corresponding to the largest conditional probability p(xi|regionk), wherein k is 1 or 2 Referring back to FIG. 2, it can be seen that more pixels with high intensity are found in the interior region than in the exterior region. Similarly, more pixels with a low intensity are found in the exterior region that the interior region. Therefore, if a pixel with value −1024 (the minimum pixel value) is observed, a reasonable decision can be made that such a pixel belongs in the exterior region rather than the interior region. Various decision-making techniques for processing digital images to perform image segmentation have been developed from this basic concept.
For example, one known method of assigning pixels to a particular region is known as the “likelihood ratio test” (LRT). The LRT is defined as:
If
                    p        ⁡                  (                                    x              i                        |                          region              1                                )                            p        ⁡                  (                                    x              i                        |                          region              2                                )                      >    T    ,then decide region1, else decide region2 wherein T is a threshold value determined from training data in which the regions corresponding to each pixel value are known exactly.
The conditional probability of regionk being the true region given that a pixel has a pixel value X=xi is proportional to the product of the conditional likelihood and the prior (region) probability. This conditional probability is called the a posteriori or posterior probability, and the maximum a posteriori (MAP) decision rule is derived therefrom as:
                                          If            ⁢                                          p                ⁡                                  (                                                            region                      1                                        |                                          x                      i                                                        )                                                            p                (                                                      region                    2                                    |                                      x                    i                                                                                =                                                                      p                  ⁡                                      (                                                                  x                        i                                            |                                              region                        1                                                              )                                                  ⁢                                  p                  ⁡                                      (                                          region                      1                                        )                                                                                                p                  ⁡                                      (                                                                  x                        i                                            |                                              region                        2                                                              )                                                  ⁢                                  p                  ⁡                                      (                                          region                      2                                        )                                                                        >            1                          ,                  decide          ⁢                                          ⁢                      region            1                          ,                  else          ⁢                                          ⁢                      region            2                                              (        7        )            
Referring back to FIG. 1, it can be seen that the different anatomic objects (kidney, spleen, ribs, spinal cord, and vertebral body) exhibit varying degrees of differences in pixel intensities. While the kidney, spleen, and spinal cord share similar pixel intensities, the pixel intensity for the ribs and vertebral body are markedly different therefrom. Further, the pixels within each object have more or less constant pixel intensities (regional coherence). Moreover, it can be seen that there are often large steps in intensity as one passes out of one object and into a neighbor (edge step differences). For example, note the dark region between the kidney and spleen. In fact, regional coherence and edge-steps are the two dominant structural features in images.
Five basic image segmentation techniques are known in the art: (1) region-based methods, (2) edge-based methods, (3) image feature clustering methods, (4) global optimization methods, and (5) hybrid methods combining region- and edge-based ideas. See Haralick R M and Shapiro L G, “Image segmentation techniques,” Computer Vision Graphics and Image Processing, 29:100–132, 1985; Pal, N. R. and Pal, S. K., “A review on image segmentation techniques,” Pattern Recognition, 26:1277–1294, 1993; and Tang, M. and Ma, S., “General scheme of region competition based on scale space,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12), 1366–1378, 2001, the entire disclosures of which are hereby incorporated by reference.
Region-based approaches assume that region pixels have almost-constant properties, and that these properties are different from region to region. Pixel properties (also called features) include pixel intensity, and mean, variance, range, moments, etc. computed over a neighborhood of surrounding pixels.
Region-based segmentation assumes that the means and variances of these properties are constant within a region, and differ among regions. See Rosenfeld, A. and Kak, A. C., Digital Picture Processing, Academic Press, 1982; Beveridge, J R, Griffith, J, Kohler, R R, Hanson, A R, and Riseman, E M, “Segmenting images using localized histograms and region merging,” International Journal of Computer Vision, 2, 311–347, 1989; and Adams, R. and Bischof, L., “Seeded region growing,” IEEE Trans. Pattern Analysis and Machine Intelligence, 16(6), 641–647, 1994, the entire disclosures of which are hereby incorporated by reference. Because of this assumption, region-based approaches falter if the pixel properties vary across the region, or if the boundaries between regions with the same properties are blurred (by patient breathing motion in CT, for example), or if neighboring regions' properties are similar or have overlapping distributions. The success of a region-based segmentation operation is highly dependent upon the proper choice of a pixel homogeneity criterion. If the pixel homogeneity criterion is poorly chosen, the image segmentation operation results in ragged boundaries and holes in the segmented region. Moreover, even with a good choice of a pixel homogeneity criterion, the overlap of pixel distributions from adjoining regions constitutes a big problem for this approach.
Edge-based methods assume that all regions are surrounded by a significant step in the pixel property. Using this assumption, edge-based image segmentation extracts a segment by semantic analysis of the edges (see Ballard, D H and Brown, C M, Computer Vision, Prentice-Hall, Englewood Cliffs, N.J., 1982, the entire disclosure of which is hereby incorporated by reference) or by fitting a flexible curve or active contour to the actual boundary in the image (see Kass, M., Witkin, A., Terzopoulos, D., “Snakes: Active contour models,” International Journal of Computer Vision, 1, 321–331, 1988. ; Cohen, L. D., On active contour models and balloons,” CVGIP: Image Understanding, 53(2), 211–218, 1991; and Malladi, R., Sethia, J. A., and Vemuri, B. C., “Shape modeling with front propagation: A level set approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, 17(2), 158–175, 1995, the entire disclosures of which are hereby incorporated by reference). Like the region approach, active contour segmentation may fail if the actual pixel data does not exhibit sharp steps everywhere around the object. Successful segmentation also depends on the kind of flexible model used to match the boundary. Flexible models such as snakes and balloons show troublesome sensitivity to initial placement in the image, and the image segmentation goes awry if that placement is not sufficiently close to the true boundary. Semantic methods can assemble contour fragments discovered in the map of edges by a set of rules to produce the object form, but this is a very complex task, and except for some highly regular applications, has not found wide use in medical image analysis.
Image-feature clustering segmentation identifies objects by looking in the space of image pixel properties for clusters that correspond to the feature combinations of object pixels. Each combination corresponds to a point in feature space, where the space dimension is equal to the number of features. Sets of object pixels with similar properties correspond to sets of points clustered in feature space. Finding such point clusters enables the analyst to go back to the image and identify the pixels inhabiting this concentration, and by implication, the object. Well-established methods exist for identifying point density maxima and classifying the feature space points/image pixels. See Devroye L, Gyorfi L, and Lugosi G, A Probabilistic Theory of Pattern Recognition, Springer, N.Y., Springer, 1996; and Duda R O, Hart P E, and Stork D G, Pattern Classification, 2nd Ed., New York: Wiley-Interscience, 2001, the entire disclosures of which are hereby incorporated by reference. For example, the nearest neighbor rule specifies that each pixel be assigned to the region corresponding to the feature space cluster nearest that pixel's feature space location. Many other clustering rules exist and are used in, for example, commercial data mining applications. See Witten, I. H. and Frank, E., Data Mining, Morgan Kaufmann, San Francisco, Calif., 1999 (the entire disclosure of which is hereby incorporated by reference); and Duda et al. (2001). The problems which can confound density rules for image clustering are the same problems described above in connection with region-based techniques; property densities overlap making classification error inevitable. Estimating cluster point densities requires that feature space be “binned”, which means that a smoothed version of the space must be computed so that areas of low frequency are given some reasonable probability. If the number of features is large, the data is spread out over more dimensions, making the pixel classification more computationally expensive, and exacerbating the data sparsity.
Global optimization segmentation performs non-linear transformations on the image pixel values to maximize an objective function which is assumed to accurately describe the properties of the image, or the properties of the measurement process that formed the image. The two most famous ideas are iterated conditional modes (See Besag, J., “On the statistical analysis of dirty pictures,” Journal of the Royal Statistical Society, 48, 259–279, 1986, the entire disclosure of which is hereby incorporated by reference) and Markov-Gibbs refinement (See Geman, S. and Geman, D., “Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6): 721–741, 1984, the entire disclosure of which is hereby incorporated by reference). In both cases, the image is transformed by small, iterative steps to versions in which the regions have constant features different from the surrounding regions. While built on powerful ideas, these methods assume simple behaviors for the pixel properties, and these assumed behaviors do not seem to correspond well to those of real images. One problem, already mentioned, is that the pixel properties are not stationary-that is, their means and variances change across a region. Non-stationarity is a complex problem not well modeled by any current theories. A second problem is that the pixel properties do not always exhibit normal distributions (Gaussian) or other distributions that such theories often employ. As such, these methods are less well-equipped to operate on images with arbitrary property distributions.
Hybrid segmentation methods operate on several image properties at once. A good example of these methods are those developed by Duncan and Staib. See Staib L H and Duncan J S, “Boundary finding with parametrically deformable models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 1061–1075, 1992, the entire disclosure of which is hereby incorporated by reference. Briefly, Staib and Duncan proposed applying rigorously correct statistical inference to automated image segmentation by matching the properties of a flexible-shape template to image objects. Later, Chakraborty added to the Duncan/Staib method by integrating the flexible contour model with region, edge, and prior shape information to find a region boundary. See Chakraborty A, Staib L H, Duncan J S, “Deformable boundary finding in medical images by integrating gradient and region information,” IEEE Transactions on Medical Imaging, 15, 859–870, 1996, the entire disclosure of which is hereby incorporated by reference. So, in addition to the region and edge properties of the current image, one could add a bias in favor of the shapes of the same organ determined previously on neighboring CT sections. The inference methods used are either the likelihood ratio decision rule or the maximum a posteriori decision rule (when prior boundary shapes are known). These decision rules guarantee minimum average errors given the data.
Because the correct boundary is not computable directly, the above-described Duncan/Staib/Chakraborty (DSC) method iteratively computes the value of the decision rule and adjusts the flexible curve so as to maximize the value of the decision-rule function. The flexible curve is a finite trigonometric series whose coefficients, or parameters, fully define the location, size, and shape of the curve. The actual “footprint” of the curve on the image can be quickly computed, and compared to an independently-segmented region boundary, to object edges in a map of all the image object edges, and to the shape of a similar object determined previously. The extent of these agreements is input to a decision rule function. If the value of the decision rule function is higher (more probable) than any previous value, this curve is saved, and the program searches for another curve that gives even higher values to the decision rule function. After satisfying a stopping criterion, the program reports the curve corresponding to the largest value of the decision rule function.
In addition to the five segmentation approaches described above, there have been a small number of information theoretic image applications. Information theory since its start has been applied to problems of data compression, coding, and model estimation. Kullback and Leibler discovered information theoretic equivalents to the minimum-error maximum likelihood and Bayesian decision rules. These rules use the data distributions directly, giving results that are independent of the origin of the data or any model of the data.
Image segmentation techniques previously developed by the inventor herein are disclosed in U.S. Pat. Nos. 5,859,951 and 6,249,594. The '951 patent describes a region-based approach using Bayesian classification of pixels operating on pixel intensity and variance features. The '594 patent describes a unique combination of Bayesian region classification with the DSC approach to create a useable tool. The initial contour used by the programs described in the '951 and '594 patents are either generated manually by the operator or taken from the contour of a related image. The '951 technique operated on single image regions, and was capable of generating a refined contour faster than the operator could draw it. The '594 technique includes the functionality of the first program, and steps from CT section to CT section, automatically computing refined contours using previous-section contours as starting estimates.
While the above-described segmentation techniques have contributed to the art, a need still exists for a segmentation technique that is more accurate and less sensitive to the effects of modeling errors or the like.