The present invention relates generally to subband decomposition-based image codecs and, more particularly, to the coding of a moving object over a series of images.
It is generally known that image compression is effective in reducing the amount of image data for transmission or storage. In particular, with the introduction of scalable image coding formats like the JPEG2000, it has become possible to send and receive only a fraction of the image file and still reconstruct a high-quality image at the receiving end. The part that is dropped from the image usually contains information that describes the high-frequency components present in the image, corresponding to the details that the human visual system (HVS) is not very sensitive to.
JPEG stands for Joint Photographic Experts Group for image compression. In 1988 this committee adopted its first standard, known as the JPEG baseline, which is based on discrete cosine transform (DCT) and on Huffman coding. In 2001, the JPEG committee developed a new compression standard, named JPEG 2000. This new standard provides low bit-rate operation, with rate-distortion and subjective image quality performance superior to existing standards, without sacrificing performance at other points in the rate-distortion spectrum. More importantly, JPEG 2000 allows extraction of different resolutions and pixel fidelities of a compressed image from the same codestream representation. It also offers features such as region-of-interest (ROI) coding and random access to image areas. This allows a user to manipulate, store or transmit only the essential information of an image for any target device from its JPEG2000 bitstream representation.
JPEG2000 is a subband decomposition-based bit-plane coders. It uses wavelets at the transform stage. The image is decomposed to multiple resolutions. Each resolution is composed of subbands representing low or/and high frequency components. The samples in the subbands are then coded in bit-planes starting from the most significant bit-plane. The usage of the wavelet transform and the bit-plane coding scheme provide the scalability feature of JPEG 2000.
Motion JPEG 2000 is a method ofvideo compression, based on intra-frame coding using JPEG 2000. In Motion JPEG 2000, frames in video sequences are coded as independent images, i.e., there is no motion prediction between the images. This coding scheme offers important functionalities such as scalability in quality and in resolution, robustness to bit errors, and frame editing. However, it is inefficient in compression performance as compared to other standards, such as MPEG-4, where estimates of motion vectors are used to code inter-frames. On the other hand, the compression performance of Motion JPEG 2000 can be improved using the features of JPEG 2000 such as ROI coding.
ROI coding is a useful functionality in JPEG 2000. It allows the allocation of more bits in an ROI than in other regions in the same image while coding it. By unequally coding the parts of the images so that important objects can be assigned more bits per pixel than the less important objects, a better visual perception of the sequence is obtained, making this feature very useful especially in low data-rate applications. To code important objects in a video sequence as ROIs, tracking of these objects becomes essential for producing a high-quality video stream.
Tracking of ROIs is an important feature in many visual-related applications, such as vision-based control, human-computer interfaces, surveillance, agricultural automation, medical imaging and visual reconstruction. The main difficulty in ROI tracking in video sequences is due to the potential variations in the target region within the frame sequence. These variations are usually due to changes in pose, shape deformations, illumination and partial or full occlusion of the target. The sources of these variations should be taken into account when designing a tracking algorithm in order to guarantee robustness and stability.
In prior art, most of the ROI tracking methods rely on a multi-stage processing strategy, which consists of three consecutive steps. At the first step, color-based foreground segmentation is performed and areas of similar color as the target region are masked. The second step involves localizing the ROI through minimization of estimated vectors, using horizontal and vertical summing projections of the segmented image, or with heuristics based on size, shape, position, aspect ratio and color consistency of the target region. The final stage is refining and smoothing the region boundaries.
Hager et al. (xe2x80x9cEfficient Region Tracking with Parametric Models of Geometry and Illuminationxe2x80x9d, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 10, Oct. 1998) makes use of linear models to simplify the optical flow equation, and a set of basis vectors to model the variation in illumination by performing singular value decomposition on a training sequence of the target. Kruger et al (xe2x80x9cWavelet Subspace Method for Real-time Face Trackingxe2x80x9d, Proc. Pattern Recognition, 23th DAGM Symposium, Munich, Germany 2002) uses a larger number of wavelet functions to construct a wavelet network representing the features of the target, and then tracks the ROI based on the Euclidean distance between the wavelet coefficients.
Kundu (U.S. Pat. No. 5,974,192) discloses a method for matching blocks in a sequence of images based on textures, dominant edge pixels as well as intensity and color. Ponticos (U.S. Pat. No. 6,035,067) discloses a method where a xe2x80x9cpotential wellxe2x80x9d is used for preventing large movements of the foreground object using predefined criteria for classifying image regions. Poggio et al (U.S. Pat. No. 6,421,463) discloses a method for searching objects in images where the search system must be trained using a wavelet coefficient template.
The major disadvantage of the prior art methods is computational complexity. Moreover, some of them require the tracking algorithm to be trained for ROI of the same type, color or shape.
Thus, it is advantageous and desirable to provide a method and device for arbitrary shape and type ROI tracking with a low computation complexity and memory cost suitable for Motion JPEG 2000 codecs.
It is a primary objective of the present invention to provide a method and device for tracking at least a portion of an object in a sequence of images, wherein the images are coded as individual images and no template and pre-trained objects are required for tracking. The objective can be achieved by combining boundary detection in the low-frequency band by pixel matching in the chrominance space and refinement of the detected boundary in the high-frequency band by edge analysis using the luminance component.
Thus, according to the first aspect of the present invention, there is provided a method of tracking a target region in an image frame based on a target region of a previous image frame in a sequence of image frames, each of said sequence of image frames comprising a plurality of pixels. The method is characterized by
determining a search area in said image frame based on at least a part of the target region in said previous frame, said search area comprising a plurality of first pixels, each pixel having at least one corresponding first pixel value; and
for the first pixels in the search area:
determining a further search area in said previous frame, said further search area including a plurality of second pixels among the plurality of pixels in the previous frames, each second pixel having at least one corresponding second pixel value and a region status;
finding a match between the first pixel value of said first pixels among the second pixel values for locating a reference second pixel; and determining the region status of at least one of said first pixels based on the region status of the reference second pixel for determining the target region in said image frame based on the region status of said at least one first pixel.
The region status of the second pixel is indicative of whether said second pixel is located within the target region in said previous frame.
Advantageously, said at least a part of the target region in the previous frame has a contour for defining a corresponding contour in said image frame, and the first pixels include pixels adjacent to the corresponding contour in said image frame.
Preferably, the first and second pixel values are indicative of at least one of the chrominance components of the wavelet coefficients in a low subband.
Advantageously, the target region of said image frame includes a boundary and said plurality of pixels in said image frame include a plurality of third pixels adjacent to the boundary, each third pixel having at least one corresponding third pixel value. The method is further characterized by
determining the edge-type of the third pixels so as to modify the target region in said image frame based on the edge-type of the third pixels.
Preferably, the third pixel values are indicative of the luminance component of wavelet coefficients in a high subband.
According to the second aspect of the present invention, there is provided a computer program for use in an image coder having means for coding a sequence of image frames into a codestream, said sequence of image frames having at least one first image frame and a preceding second image frame, each of the first and second image frames having a plurality of pixels, wherein the second image frame has a target region. The computer program is characterized by
a code for defining a search area in the first image frame based on at least a part of the target region in the second image frame, said search area comprising a plurality of first pixels, each pixel having at least one corresponding first pixel value; and
a code for determining, for the first pixels in the search area:
a further search area in the second image frame, said further search area including a plurality of second pixels among the plurality of pixels of the second image frame, each of the second pixels having at least one corresponding second pixel value and a region status;
a reference second pixel in said further search area based on a match between the first pixel value of said first pixels among the second pixel values; and the region status of at least one of said first pixels based on the region status of the reference second pixel for determining a target region in the first image frame based on the region status of said at least one first pixel.
The region status of the second pixel is indicative of whether said second pixel is located within the target region in the second image frame.
Preferably, the first and second pixel values are indicative of at least one of the chrominance components of wavelet coefficients in a low subband.
Advantageously, the target region of the first image frame includes a boundary and said plurality of pixels in the first image frame include a plurality of third pixels adjacent to the boundary, each third pixel having at least one corresponding third pixel value. The computer program is further characterized by
a code for determining the edge-type of the third pixels so as to modify the target region in the first image frame based on the edge-type of the third pixels.
Preferably, the third pixel values are indicative of the luminance component of wavelet coefficients in a high subband.
According to the third aspect of the present invention, there is provided an image encoder for coding a sequence of image frames comprising at least one first image frame and a preceding second image frame, each of the first and second image frames including a plurality of pixels, the second image frame having a target region, said image encoder having:
means for decomposing each of the image frames into a plurality of subband components; and
means for coding the subband components into a codestream. The image encoder is characterized by
a first algorithm, responsive to the subband components, for defining a search area in the first image frame based on at least a part of the target region in the second image frame, the search area including a plurality of first pixels, each having at least one corresponding first pixel value; and
a second algorithm, responsive to the first pixels, for determining:
a further search area in the second image frame including a plurality of second pixels among the plurality of pixels in the second image frame, each second pixel having at least one corresponding second pixel value and a region status;
a reference second pixel in the further search area based on a match
between the first pixel value of the first pixels among the second pixel values; and a region status of at least one of the first pixels based on the region status of the reference second pixel for determining a target region in the first image frame based on the region status of said at least one first pixel.
Advantageously, the image encoder is adapted to code said target region in the first image frame with higher visual quality than another region in said first image frame.
According to the fourth aspect of the present invention, there is provided an image coding system having an encoder for coding a sequence of image frames into a codestream, and a decoder for reconstructing the sequence of image frames based on the codestream, wherein the sequence of image frames comprising at least a first image frame and a preceding second image frame, each of the first and second image frames comprising a plurality of pixels, the second image frame having a target region, said image encoder having:
means for decomposing each of the image frames into a plurality of subband components; and
means for coding the subband components into the codestream. The image encoder is characterized by
a first algorithm, responsive to the subband components, for defining a search area in the first image frame based on at least a part of the target region in the second image frame, the search area including a plurality of first pixels, each having at least one corresponding first pixel value; and
a second algorithm, responsive to the first pixels, for determining:
a further search area in the second image frame including a plurality of second pixels among the plurality of pixels in the second image frame, each second pixel having at least one corresponding second pixel value and a region status;
a reference second pixel in the further search area based on a match
between the first pixel value of the first pixels among the second pixel values; and a region status of at least one of the first pixels based on the region status of the reference second pixel for determining a target region in the first image frame based on the region status of said at least one first pixel.
The present invention will become apparent upon reading the description taken in conjunction with FIGS. 1 to 5.