Motion information is useful in digital image analysis and processing and a number of techniques to detect motion in a series or sequence (“set”) of images or video frames (“images”) have been considered. One common technique used to detect motion in an image is motion segmentation. Motion segmentation attempts to partition an image into parts having different motions.
Generally, two types of motion segmentation are known. One motion segmentation technique assumes that the camera used to capture the image is fixed, and thus that the background content of the image is stationary. Video surveillance commonly employs this motion segmentation technique to detect motion in captured images. The other motion segmentation technique assumes that the camera used to capture the image has motion resulting in both moving foreground and background content. Although many applications, such as for example object tracking, accurate motion estimation, and computer vision, employ this motion segmentation technique, this motion segmentation technique presents challenges in that there are no general definitions of what constitutes foreground and background in captured images.
Other attempts to detect motion in images have been considered. For example, U.S. Pat. No. 5,995,668 to Corset et al. discloses a method and system for coding and decoding segmented image frames. The system comprises a first sub-system defining the time evolution of segmented partitions, and a second sub-system encoding both contours and textures of regions within the partitions. The time evolution definition leads to a partition tree from which regions are extracted during an analysis step in order to form a decision tree. A decision operation allows distinct regions from various levels of the partition tree to be selected in order to construct an optimal final partition and, simultaneously, to choose the coding technique for each region of the final partition. Motion is estimated by dividing an image frame into small blocks of pixels and comparing the pixels to same-size blocks in a previous image frame. The closest match indicates the motion vectors for all blocks. Motion is compensated for by applying the resultant motion vectors to regions in the image frames. Region boundaries are defined using the Watershed algorithm.
U.S. Pat. No. 6,075,875 to Gu discloses a method for segmenting and tracking arbitrary shapes in a series of video image frames. In an intraframe mode of operation, a motion representation of corresponding pixels in the selected video image frame and a preceding video image frame is obtained to form motion-segmented video image features. Video image features are also segmented according to their spatial image characteristics to form spatially-segmented video image features. The video image features are jointly segmented as a weighted combination of the motion-segmented video features and the spatially-segmented video image features.
U.S. Pat. No. 6,301,385 to Chen et al. discloses a method and apparatus for segmenting images. Multiple segmentation approaches including motion segmentation, focus segmentation and intensity segmentation are employed to provide input to a two-layered neural network. Focus and motion measurements are taken from high frequency data in the images and intensity measurements are taken from low frequency data in the images. The focus and motion measurements are used to segment an object allowing moving foreground to be segmented from stationary foreground as well as from moving and stationary background.
U.S. Pat. No. 6,625,310 to Lipton et al. discloses a method for segmenting video data into foreground and background utilizing statistical modeling of image pixels. A statistical model of the background is built and each pixel of an input video frame is compared with the background statistical model. The results of the comparisons are used to classify each pixel as foreground or background.
U.S. Pat. No. 6,711,278 to Gu et al. discloses a method for tracking semantic objects with multiple non-rigid motion, disconnected components and multiple colors. The semantic objects are tracked by spatially segmenting image regions from an input image frame and classifying the image regions based on the semantic object from which they originated in the previous image frame. Region-based motion estimation between each spatially segmented region and the previous image frame is carried out to compute the position of a predicted region in the previous image frame. Each region in the current image frame is then classified as being part of a semantic object based on which semantic object in the previous image frame contains the most overlapping points of the predicted region. In this manner, each region in the current image frame is tracked to one semantic object from the previous image frame, with no gaps or overlaps.
U.S. Patent Application Publication No. 2002/0176625 to Porikli et al. discloses a method for segmenting multi-resolution video objects in a sequence of video frames. Feature vectors are assigned to the pixels of the video frames. Selected pixels in the video frames are then identified as marker pixels and pixels adjacent to each marker pixel are assembled into corresponding volumes of pixels if the distance between the feature vector of the marker pixel and the feature vector of the adjacent pixels is less than a first predetermined threshold. After all pixels have been assembled into the volumes, a first score and descriptors are assigned to each volume. At this point, each volume represents a segmented video object. The volumes are then sorted from highest to lowest order according to the first scores, and further processed in that order. Second scores, dependent on the descriptors of pairs of volumes are then determined. The volumes are iteratively combined if the second score passes a second threshold.
U.S. Patent Application Publication No. 2003/0133503 to Paniconi et al. discloses a motion segmentation technique employing multi-frame motion estimation. The motion segmentation operates by determining multiple classification hypotheses and by re-classifying poorly classified regions according to a multi-frame hypothesis tracking algorithm. This involves determining a similarity measure for each hypothesis class, and then assigning a classification to the region with the hypothesis class that is most similar or consistent with past or future data.
U.S. Patent Application Publication No. 2003/0235327 to Srinivasa discloses a method for detecting and tracking objects in video images. A detection module receives images, extracts edges in horizontal and vertical directions, and generates an edge map where object-regions are ranked by their immediacy. Filters remove attached edges and ensure regions have proper proportions. The regions are tested using a geometric constraint to ensure proper shape, and are fit with best-fit rectangles, which are merged or deleted depending on their relationships. Remaining rectangles are objects. A tracking module receives images in which objects are detected and uses Euclidean distance/edge density criterion to match objects. If objects are matched, clustering determines whether the object is new. If an object is deemed not to be new, a sum-of-squared-difference in intensity test is used to locate matching objects.
U.S. Patent Application Publication No. 2004/0091047 to Paniconi et al. discloses a method and apparatus for detecting nonlinear and multiple motion in images. When an input image is received, the input image is partitioned into regions, and a motion model is applied to each region to extract the motion and associated moving boundaries.
U.S. Patent Application Publication No. 2004/0165781 to Sun discloses a method for determining motion vectors from one point in an image frame of a scene to corresponding points in at least two other image frames of the scene. Each point in a reference image frame has projected thereon corresponding points from two other image frames in order to provide a system of equations that describe point correspondence across the three images. Pointwise displacement between the three frames is then calculated to detect motion.
U.S. Patent Application Publication No. 2004/0252886 to Pan et al. discloses a method of automatic video object extraction. Color segmentation and motion segmentation are performed on source video. Color segmentation identifies substantially uniform color regions using a seed pixel and a neighborhood method and merges small color regions into larger ones. The motion segmentation for each region identified by color, determines the motion vector for the region by region matching to a subsequent image frame. Once a motion vector for each region is obtained, a motion mask is made.
Although the aforementioned references disclose various techniques for motion segmentation, there exists a need for a computationally efficient yet accurate motion segmentation technique. It is therefore an object of the present invention to provide a novel method, apparatus and computer readable medium embodying a computer program for determining motion in images.