The present invention relates to an object extraction apparatus and, more particularly, to an object extraction apparatus for detecting the position of a target object from input moving picture and tracking/extracting a moving object.
An algorithm for tracking/extracting an object in moving picture has conventionally been proposed. This is a technique of extracting only a given object from a picture including various objects and a background. This technique is useful for a process and editing of moving picture. For example, a person extracted from moving picture can be synthesized with another background.
As a method used for object extraction, the region dividing technique using region segmentation of the spatio-temporal image sequence (Echigo and Hansaku, xe2x80x9cregion segmentation of the spatio-temporal image sequence for video mosaicxe2x80x9d, THE 1997 IEICE SYSTEM SOCIETY CONFERENCE, D-12-81, p. 273, September, 1997) is known.
In this region dividing method using region segmentation of the spatio-temporal image sequence, moving picture is divided into small regions according to the color texture in one frame of the moving picture, and the regions are integrated in accordance with the relationship between the frames. When a picture in a frame is to be divided, initial division must be performed. This greatly influences the division result. In this region dividing method using region segmentation of the spatio-temporal image sequence, initial division is changed by using this phenomenon in accordance with another frame. As a result, different division results are obtained, and the contradictory divisions are integrated in accordance with the motion between frames.
If, however, this technique is applied to tracking/extracting of an object in moving picture without any change, a motion vector is influenced by an unnecessary motion other than the motion of the moving object as a target. In many cases, therefore, the reliability is not satisfactorily high, and erroneous integration occurs.
A moving object detecting/tracking apparatus using a plurality of moving object detectors is disclosed in Jpn. Pat. Appln. KOKAI Publication No. 8-241414. For example, this conventional moving object detecting/tracking apparatus is used for a monitoring system using a monitor camera. This apparatus detects a moving object from an input moving picture and tracks it. In this moving object detecting/tracking apparatus, the input moving picture is input to a picture segmenting section, an inter-frame difference type moving object detector section, a background difference type moving object detector section, and a moving object tracking section. The picture segmenting section segments the input moving picture into blocks each having a predetermined size. The division result is sent to the inter-frame difference type moving object detector section and the background difference type moving object detector section. The inter-fame difference type moving object detection section detects the moving object in the input picture by using the inter-frame difference in units of difference results. In this case, to detect the moving object without being influenced by the moving speed of the moving object, the frame intervals at which inter-frame differences are obtained are set on the basis of the detection result obtained by the background difference type moving detector section. The background difference type moving detector section detects the moving object by obtaining the difference between the moving object and the background picture created by using the moving picture input so far in units of division results. An integration processor section integrates the detection results obtained by the inter-frame difference type moving object detector section and the background difference type moving detector section to extract the motion information about the moving object. After the object is detected from each frame, the moving object tracking section makes the correction moving objects on the respective frames correspond to each other.
In this arrangement, since a moving object is detected by using not only an inter-frame difference but also a background difference, the detection precision is higher than that in a case wherein only the inter-frame difference is used. However, owing to the mechanism of detecting an object in motion from overall input moving picture by using an inter-frame difference and background difference, the detection result of the inter-frame difference and background difference are influenced by unnecessary motions other than the motion of the target moving object. For this reason, a target moving object cannot be properly extracted/tracked from a picture with a complicated background motion.
Another object extraction technique is also known, in which a background picture is created by using a plurality of frames, and a region where the difference between the pixel values of the background picture and input picture is large is extracted as an object.
An existing technique of extracting an object by using this background picture is disclosed in xe2x80x9cMOVING OBJECT DETECTION APPARATUS, BACKGROUND EXTRACTION APPARATUS, AND UNCONTROLLED OBJECT DETECTION APPARATUSxe2x80x9d, Jpn. Pat. Appln. KOKAI Publication No. 8-55222.
According to this technique, the moving picture signal of the currently processed frame is input to a frame memory for storing one-frame picture data, a first motion detection section, a second motion detection section, and a switch. A video signal one frame ahead of the current frame is read out from the frame memory and input to the first motion detection section. The background video signals generated up to this time are read out from the frame memory prepared to hold background pictures and is input to the second motion detection section and the switch. Each of the first and second motion detection section extracts an object region by using, for example, the difference value between the two input video signals. Each extraction result is sent to a logical operation circuit. The logical operation circuit calculates the AND of the two input video data, and outputs it as a final object region. The object region is also sent to the switch. The switch selects signals depending on an object region as follows. For a pixel belonging to the object region, the switch selects a background pixel signal. In contrast to this, for a pixel that does not belong to the object region, the switch selects the video signal on the currently processed frame, and the signal is sent as an overwrite signal to the frame memory. As a result, the corresponding pixel value in the frame memory is overwritten.
According to this technique, as disclosed in Jpn. Pat. Appln. KOKAI Publication No. 8-55222, as the processing proceeds, more accurate background pictures can be obtained. At the end, the object is properly extracted. However, since the background picture is mixed in the object in the initial part of the moving picture sequence, the object extraction precision is low. In addition, if the motion of the object is small, the object picture permanently remains in the background picture, and the extraction precision remains low.
As described above, in the conventional object extraction/tracking method, owing to the mechanism of detecting an object in motion from the overall input moving picture, the detection result of the inter-frame difference and background difference are influenced by unnecessary motions other than the motion of the target moving object. For this reason, a target moving object cannot be properly extracted/tracked.
In the object extraction method using background pictures, the extraction precision is poor in the initial part of a moving picture sequence. In addition, if the motion of the object is small, since a background picture remains incomplete, the extraction precision remains low.
It is an object of the present invention to provide an object extraction apparatus for moving picture which can accurately extract/track a target object without being influenced by unnecessary motions around the object.
It is another object to provide an object extraction apparatus which can accurately determine a background picture and obtain a high extraction precision not only in the late period of a moving picture sequence but also in the early period of the moving picture sequence regardless of the magnitude of the motion of an object.
According to the present invention, there is provided an object extraction apparatus comprising a background region determination section for determining a first background region common to a current frame as an object extraction target and a first reference frame that temporally differs from the current frame on the basis of a difference between the current frame and the first reference frame, and determining a second background region common to the current frame and a second reference frame that temporally differs from the current frame on the basis of a difference between the current frame and the second reference frame, an extraction section for extracting a region, in a picture on the current frame, which belongs to neither the first background region nor the second background region as an object region, and a still object detection section for detecting a still object region.
In this object extraction apparatus, two reference frames are prepared for each current frame as an object extraction target, and the first common background region commonly used for the current frame and the first reference frame is determined on the basis of the first difference image between the current frame and the first reference frame. The second common background region commonly used for the current frame and the second reference frame is determined on the basis of the second difference image between the current frame and the second reference frame. Since the object region on the current frame is commonly included in both the first and second difference images, the object region on the current frame can be extracted by detecting a region, of the regions that belong to neither the first common background region nor the second common background region, which is included in the image inside figure of the current frame. If this object region corresponds to a still object, a still object region is detected when there is no difference between the preceding object region and the current object region.
In this manner, a region that does not belong to any of the plurality of common background regions determined on the basis of the temporally different reference frames is determined as an extraction target object to track the object. This allows accurate extraction/tracking of the target object without any influences of unnecessary motions around the target object.
It is preferable that this apparatus further comprise a background correction section for correcting motion of a background on the reference frame or the current frame such that the motion of the background between each of the first and second reference frames and the current frame becomes relatively zero. With this background correction section set on the input stage of the figure setting section or background region determination section, even if background moving picture gradually changes between continuous frames as in a case wherein, for example, a camera is panned, the pseudo background moving picture can be made constant between these frames. Therefore, when the difference between the current frame and the first or second reference frame is obtained, the backgrounds of these frames can be canceled out. This allows common background region detection processing and object region extraction processing without any influences of changes in background. The background correction section can be realized by motion compensation processing.
In addition, the background region determination section preferably comprises a detector section for detecting difference values between the respective pixels, in a difference image between the cur rent frame and the first or second reference frame, which are located near a contour of a region belonging to the image inside figure on the current frame or the image inside figure on the first or second reference frame, and a determination section for determining a difference value for determination of the common background region by using the difference values between the respective pixels near the contour, and determines the common background region from the difference image by using the determined difference value as a threshold value for background/object region determination. By paying attention to the difference values between the respective pixels near the contour in this manner, a threshold value can be easily determined without checking the entire difference image.
The figure setting section preferably comprises a segment section for segmenting the image inside figure of the reference frame into a plurality of blocks, a search section for searching for a region on the input frame in which an error between each of the plurality of blocks and the input frame becomes a minimum, and a setting section for setting figures surrounding a plurality of regions searched out on the input frame. With this arrangement, an optimal new figure for an input frame as a target can be set regardless of the shape or size of the initially set figure.
The present invention further comprises a prediction section for predicting a position or shape of the object on the current frame from a frame from which an object region has already been extracted, and a selector section for selecting the first and second reference frames to be used by the background region determination section on the basis of the position or shape of the object on the current frame which is predicted by the prediction section.
By selecting proper frames as reference frames to be used in this manner, a good extraction result can always be obtained.
Letting Oi, Oj, and Ocurr be objects on reference frames fi and fj and a current frame fcurr as an extraction target, optimal reference frames fi and fj for the proper extraction of the shape of the object are frames that satisfy
(Oi∩Oj)Ocurr
That is, frames fi and fj whose objects Oi and Oj have an intersection belonging to the object Ocurr.
In addition, the present invention is characterized in that a plurality of object extraction sections for performing object extraction by different methods are prepared, and object extraction is performed while these object extraction sections are selectively switched. This apparatus preferably uses a combination of first object extraction sections for performing object extraction by using the deviations between the current frame and at least two reference frames that temporally differ from the current frame and second object extraction sections for performing object extraction by predicting an object region on the current frame from a frame having undergone object extraction using inter-frame prediction. With this arrangement, even if the object is partially still, and no difference between the current frame and each reference frame can be detected, compensation for this situation can be made by the object extraction section using inter-frame prediction.
When a plurality of object extraction sections are prepared, it is preferable that this apparatus further comprise an extraction section for extracting a feature value of a picture in at least a partial region of the current frame as the object extraction target from the current frame, and switch the plurality of object extraction sections on the basis of the extracted feature value.
If, for example, it is known in advance whether a background moves or not, the corresponding property is preferably used. If there is a background motion, background motion compensation is performed. However, perfect compensation is not always ensured. Almost no compensation may be given for a frame exhibiting a complicated motion. Such a frame can be detected in advance in accordance with the compensation error amount in background motion compensation, and hence can be excluded from reference frame candidates. If, however, there is no background motion, this processing is not required. This is because if another object moves, wrong background motion compensation may be performed, or even an optimal frame for reference frame selection conditions may be excluded from reference frame candidates, resulting in a decrease in extraction precision. In addition, one picture may include various properties. The object motions and textures partly differ. For these reasons, the object may not be properly extracted by using the same tracking/extracting method and apparatus and the same parameter. It is therefore preferable that the user designate a portion of a picture which has a special property, or a difference in a picture be automatically detected as a feature value, and tracking/extracting methods be partly switched in units of, e.g., blocks in each frame to perform object extraction or the parameter be changed on the basis of the feature value.
If a plurality of object extraction sections are switched on the basis of the feature value of a picture in this manner, the shapes of objects in various pictures can be accurately extracted.
Assume that the first object extraction section using the deviations between the current frame and at least two reference frames that temporally differ from the current frame and the second object extraction section using inter-frame prediction are used in combination. In this case, the first and second object extraction sections are selectively switched and used on the basis of the prediction error amount in units of blocks in each frame as follows. When the prediction error caused by the second object extraction section falls within a predetermined range, the extraction result obtained by the second object extraction section is used as an object region. When the prediction error exceeds the predetermined range, the extraction result obtained by the first object extraction section is used as an object region.
The second object extraction section is characterized by performing inter-frame prediction in a sequence different from an input frame sequence such that a frame interval between a reference frame and the current frame as the object extraction target is set to a predetermined number of frames or more. With this operation, since the motion amount between frames increases as compared with a case wherein inter-frame prediction is sequentially performed in the input frame sequence, the prediction precision can be increased, resulting in an increase in extraction precision.
In some cases, an object motion is too small or complicated to be coped with by the shape prediction technique using inter-frame prediction depending on the frame intervals. If, for example, a shape prediction error exceeds a threshold value, the prediction precision can be increased by increasing the interval between a target frame and the extracted frame used for prediction. This leads to an increase in extraction precision. In addition, if there is a background motion, reference frame candidates are used to obtain the background motion relative to the extracted frame to perform motion compensation. However, the background motion may be excessively small or complicated depending on the frame intervals, and hence background motion compensation may not be performed with high precision. In this case as well, the motion compensation precision can be increased by increasing the frame intervals. If the sequence of extracted frames is adaptively controlled in this manner, the shape of an object can be extracted more reliably.
In addition, according to the present invention, there is provided an object extraction apparatus for receiving moving picture data and shape data representing an object region on a predetermined frame of a plurality of frames constituting the moving picture data, comprising a readout section for reading out moving picture data from a storage unit in which the moving picture data is stored, and performing motion compensation for the shape data, thereby generating shape data in units of frames constituting the readout moving picture data, a generator section for generating a background picture of the moving picture data by sequentially overwriting picture data in a background region of each frame, determined by the generated shape data, on a background memory, and a readout section for reading out the moving picture data again from the storage unit on which the moving picture data is recorded, obtaining a difference between each pixel of each frame constituting the readout moving picture data and a corresponding pixel of the background picture stored in the background memory, and determining a pixel exhibiting a difference whose absolute value is larger than a predetermined threshold value as a pixel belonging to the object region.
In this object extraction apparatus, in the first scanning processing of reading out the moving picture data from the storage unit, a background picture is generated in the background memory. The second scanning processing is then performed to extract an object region by using the background picture completed by the first scanning. Since the moving picture data is stored in the storage unit, an object region can be extracted with a sufficiently high precision from the start of the moving picture sequence by scanning the moving picture data twice.
The present invention further comprises an output section for selectively outputting one of an object region determined by shape data of each of the frames and an object region determined on the basis of an absolute value of a difference from the background picture as an object extraction result. Depending on the picture, the object region determined by the shape data obtained by the first scanning is higher in extraction precision than the object region obtained by the second scanning using the difference from the background picture. The extraction precision can therefore be further increased by selectively outputting the object region obtained by the first scanning and the object region obtained by the second scanning.
Furthermore, according to the present invention, there is provided an object extraction apparatus for receiving moving picture data and shape data representing an object region on a predetermined frame of a plurality of frames constituting the moving picture data, and sequentially obtaining shape data of the respective frames by using frames for which the shape data have already been provided or from which shape data have already been obtained as reference frames, comprising a division section for segmenting a currently processed frame into a plurality of blocks, a search section for searching for a similar block, for each of the blocks, which is similar in figure represented by picture data to the currently processed block and is larger in area than the currently processed block, from the reference frame, a paste section for pasting shape data obtained by extracting and reducing shape data of each similar block from the reference frame on each block of the currently processed frame, and an output section for outputting the pasted shaped data as shape data of the currently processed frame.
This object extraction apparatus performs search processing in units of blocks in the current frame as an object extraction target to search for a similar block that is similar in graphic figure represented by picture data (texture) to the currently processed block and larger in area than the currently processed block. The apparatus also pastes the data obtained by extracting and reducing the shape data of each similar block searched out on the corresponding block of the currently processed frame. Even if the contour of an object region, given by shape data, deviates, the position of the contour can be corrected by reducing and pasting the shape data of each similar block larger than the currently processed block in this manner. If, therefore, the data obtained when the user approximately traces the contour of an object region on the first frame with a mouse or the like is input as shape data, object regions can be accurately extracted from all the subsequent input frames.
Moreover, according to the present invention, there is provided an object extraction apparatus for receiving picture data and shape data representing an object region on the picture, and extracting the object region from the picture data by using the shape data, comprising a setting section for setting blocks on a contour portion of the shape data, and searching for a similar block, for each of the blocks, which is similar in graphic figure represented by the picture data to each block and is larger than the block, from the same picture, a replace section for replacing the shape data of each of the blocks with shape data obtained by reducing the shape data of each of the similar blocks, a repeat section for repeating the replacement by a predetermined number of times, and an output section for outputting shape data obtained by repeating the replacement as corrected shape data.
The position of the contour provided by shape data can be corrected by performing replacement processing using similar blocks based on block matching within a frame. In addition, since the block matching is performed within a frame, a search for similar blocks and replacement can be repeatedly performed for the same blocks. This can further increase the correction precision.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.