This invention relates to object segmentation and tracking within a sequence of image frames, and more particularly to object segmentation using an active contour model having global relaxation to maintain an optimal contour edge of an image object being tracked.
With regard to object segmentation, active contour models, also known as snakes, have been used for adjusting image features, in particular image object boundaries. In concept, active contour models involve overlaying an elastic curve onto an image. The curve (i.e., snake) deforms itself from an initial shape to adjust to the image features. An energy minimizing function is used which adapts the curve to image features such as lines and edges. The function is guided by external constraint forces and image forces. The best fit is achieved by minimizing a total energy computation of the curve.
The energy computation is derived from (i) energy terms for internal tension (stretching) and stiffness (bending), and (ii) potential terms derived from image features (edges; corners). A pressure force also has been used to allow closed contours to inflate. The energy of the snake is given below:
xe2x80x83Esnake=xcex1Etension+xcex2Estiffness+xcex3Epotential+xcfx81Epressurexe2x80x83xe2x80x83(I)
The adjustment of the snake to the object feature is strongly influenced by the weighting parameters xcex1, xcex2, xcex3, and xcfx81. Difficulties in applying conventional active contour models include: (i) being attracted to spurious local features (i.e., spurious edge points); and (ii) not being able to easily choose a reliable set of weighting parameters. Conventionally, iterations are applied to get the entire contour to converge to an optimal path. A new approach to contour modelling is an aspect of this invention. Such approach is described below in the summary of invention.
With regard to object tracking, data clustering methods are used, such as found in pattern learning and recognition systems based upon adaptive resonance theory (ART). Adaptive resonance theory, as coined by Grossberg, is a system for self-organizing stable pattern recognition codes in real-time data in response to arbitrary sequences of input patterns. (See xe2x80x9cAdaptive Pattern Classification and Universal Recoding: II . . . ,xe2x80x9d by Stephen Grossberg, Biological Cybernetics 23, pp. 187-202 (1976)). It is based on the problem of discovering, learning and recognizing invariant properties of a data set, and is somewhat analogous to the human processes of perception and cognition. The invariant properties, called recognition codes, emerge in human perception through an individual""s interaction with the environment. When these recognition codes emerge spontaneously, as in human perception, the process is said to be self-organizing.
Adaptive Resonance Theoryxe2x80x94(xe2x80x98ART 1xe2x80x99) networks implement a set of differential equations responsive to arbitrary sequences of binary input patterns. Adaptive Resonance Theoryxe2x80x942(xe2x80x98ART 2xe2x80x99) networks self-organize stable recognition categories in response to arbitrary sequences of not only binary, but also analog (gray-scale, continuous-valued) input patterns. See xe2x80x9cART 2: Self-Organization of Stable Category Recognition Codes for Analog Input Patterns,xe2x80x9d by Gail A. Carpenter and Stephen Grossberg. A modified ART-2 network is implemented to achieve an inventive clustering and pattern recognition system for image object tracking.
According to the invention, a system for image object tracking and segmentation, includes (i) a modified adaptive resonance theoryxe2x80x942 (M-ART2) model for detecting changes of scenes, (ii) a two-dimensional correlative autopredictive search (2D CAPS) method for object tracking, and (iii) an active contour model with global relaxation for defining optimal image object boundaries.
According to one aspect of the invention, a given image frame is processed within an M-ART2 model to determine whether there has been a change in scene in comparison to a prior image frame. The M-ART2 MODEL processes the data points of an image frame. A data point corresponds to a pixel which is coded using RGB, YUV or another known or standard color coding scheme. Each data point is an input vector. The input vectors are grouped into clusters. A given cluster has a corresponding centroid value, referred to herein as a prototype vector. The prototype vector corresponds to a weighted centroid (color) for such cluster. The input vectors are allocated into clusters as based upon a minimum distance measure, (e.g., minimal distance from the cluster""s prototype vector). The final prototype vectors for a given image frame are used as the initial prototype vectors for the next image frame. If the number of vectors (or pixels) in the clusters have changed by more than a predetermined amount, then a scene change is considered to have occurred. Specifically, the rate of cluster change is tracked from image frame to image frame. If the rate exceeds a prescribed value, then a scene change has occurred.
According to another aspect of the invention, a correlative auto-predictive search model is used to track an image object. The model receives an image object as an input template. Such input template is the image object of the prior image frame. For an initial frame the image object is found by detecting and modelling the object boundary using an edge energy model and an active contour model, either automatically or with user intervention. The image object may have moved or disappeared in the transition from the prior image frame to the current frame. To locate the image object in the current image frame, the input template is compared to windows of the current image frame (i.e., the search area). The location(s) where the template has the highest correlation coefficient with the underlying window is selected as a match for the template.
According to another aspect of the invention, the edge energy of the image object is derived. At one step, an image is decomposed by filtering the image with a quadrature mirror filter (QMF) pair which brings out the image details, while simultaneously smoothing the image. The horizontal and vertical detail are gradients of the image along x and y axes. The magnitude of the image gradients is taken as an edge potential energy at each level of decomposition, then summed with a set of weighting parameters to derive the edge potential energy for a color component. A weighted average of the edge potential energy for each color component is summed to achieved the total edge potential energy for the image. The total edge potential energy is an array having a potential energy value for each pixel. The total edge potential energy is input to the active contour model for use in object segmentation.
In an initial frame the image processed to derive edge potential energy is the entire image frame. For subsequent frames the image for which edge potential energy is derived is the image area which includes the image object found using the CAPS model. Alternatively, for the subsequent frames the image area also is the entire image frame.
According to an aspect of the invention, the active contour model receives as inputs a set of data points along the image object being tracked. In some embodiments, the initial edge points for the initial frame are selected manually by an operator. In other embodiments an automated process is performed to derive such edge points and input them to the processing models of this invention. For subsequent image frames, the set of edge points is the boundary of the template match areas of the image frame. Such boundary is determined during processing within the CAPS model.
The active contour model also receives the derived edge potential energy for the object boundary (i.e., edge). The received set of edge points corresponds to a current edge. The current edge is adjusted by the active contour model to achieve a more accurate image object boundary. For each data point among the set of current data points, several alternative candidate data points are selected. In one embodiment M candidate points are selected from the area surrounding the current data point. The number and manner of selecting candidate points, however, may vary. Between the current points and the alternative points there are (M+1)N potential contours, where N is the number of sampled points along the image object""s current boundary. Only one contour is to be selected as the modelled contour which is to be output as the image object boundary. Rather than calculate an energy value for each of the (M+1)N potential contours, a solution is achieved by comparing energy differences along a travel path. The selected contour is built in steps. The process starts from any of the current data points and its corresponding candidate points. An optimal path is selected starting from the current data point and starting from each one of its candidate points for a total of M+1 optimal paths. At each step for each of the M+1 paths, the path may advance to one of M+1 points, (i.e., a neighboring current edge point or one of its M alternative candidate points). An energy difference is calculated for the M candidate points at each step along each path. As a result only (M+1)*(M+1)*N calculations are performed rather than a brute force (M+1)N calculations. At each step, for each candidate point, only one optimal path is selected from M+1 potential paths. After the last step, there are M+1 paths. The most optimal path is selected from the M+1 paths as the image object boundary. At each such step the distance of the selected point from the current point is stored (for use in deriving energy difference when processing the next point set along the contour).
One advantage of the invention is that processing time and storage space is substantially reduced compared to a brute force method in which an energy calculation is derived for each of the (M+1)N possible contours.