Colon cancer is one of the leading causes of death in the U.S. The number of deaths can be largely reduced if polyps can be detected and treated at their early stage of development. Virtual colonoscopy is a new technology being developed to help doctors find polyps in three dimensional (3D) computed tomography (CT) image data. However, it currently requires that the colon be physically cleansed prior to the CT scan. This is very inconvenient and prevents virtual colonoscopy from being a general screening tool for a large population.
The task of automatic segmentation is very challenging. First, the CT data is taken without bowel cleansing in order to minimize the inconvenience to patients. Tagged materials, such as stool, though mostly depicted as bright areas in the image, are a big distraction. Second, polyps of interest are very small and don't have unique intensity patterns, nor have any special shapes. It is hard to distinguish them from the colon wall, especially when they are surrounded by tagged material. Third, the volumetric data to be processed is massive (e.g., 400×512×512), which eliminates the possibility of using any computationally expensive method.
By tagging residual materials (e.g., stool) to make them appear bright under CT, the materials can be electronically removed. This becomes essentially a segmentation problem in which the task is to delineate and locate the colon wall. This process is also referred to as colon detagging. However, residual materials observe large variation in appearance depending upon where they are, what the patient eats, and how much they are tagged. Furthermore, the challenge of segmenting an uncleansed colon comes from the fact that residual materials and colon walls observe complex patterns, which are hard to separate. This is representative of a large class of problems in medical imaging and vision in which the task is to perform foreground/background segmentation.
Existing approaches often define certain appearance models (mostly independently identically distributed (i.i.d.)) for the foreground and background followed by an energy minimization procedure. However, these models, such as Gaussian, only work in very constrained situations since it is difficult to use them to capture the larger intra-class variability and inter-class similarity. There is a need for a learning based approach using learned discriminative models for the appearance of complex foreground and background images.