1. Field of the Invention
The present invention relates to a method for video object segmentation, and more particularly relates to a method for video object segmentation that updates a static object to the background by updating background.
2. Description of the Prior Art
In recent years, computer and communication techniques have improved gratefully, as well as digital media techniques. Since digital media techniques can directly transmit to users by utilizing reduced data amounts, these improvements deeply affect users of such technology. The concept of media comprises not only normally used music and images, but also other media such as spoken words, images and diagrams. 3C (computer, communication, and consuming) integrated electronic products such as MP3 video, mobile phones with cameras, and electronic clothes also popularize media techniques. In this broad field, video technique is a popular research object. Different from single static image, video also considers time, thus video can be regarded as a single image continuously varying in a specific period of time. Therefore, the data amount increases rapidly as time increases, and it becomes a serious problem of media signal processing.
Prior art video object segmentation methods can be classified into two kinds, one kind of which is automatic video object segmentation, and the other kind is semi-automatic video object segmentation. Automatic video object segmentation is also called unsupervised video object segmentation method, which automatically segments the video object through a computer. Semi-automatic video object segmentation method is also called supervised video object segmentation method, which defines the object regions to be segmented prior to segmentation, and frames are segmented according to defined object regions. The two kinds of segmentation methods are described below. Normally, automatic video object segmentation method comprises the video object segmentation method based on object edge information, the video object segmentation method based on time and spatial domain, and the semi-automatic video object segmentation method comprising video object segmentation methods which interacts with the user.
The video object segmentation method based on object edge information always utilizes Canny edge detecting operators to get object outline information. In the first step, the method computes the difference between two nearby frames, and performs Canny edge detection to generate a double object edge map. After that, object edge of the current frame is subtracted. Next, the edge of static and dynamic edge can be merged to obtain the object outline. Finally, a more complete outline can be obtained by jointly detecting horizontal and vertical regions and type operating. Such a method utilizes a Canny operator, which determines reliability of the edges according to the gradient, to obtain a static object. The difference gradient is near to zero when the grains of object region and background region are similar, thus the object edge may miss. Also, noises may be causes of error in determining dynamic and static edges. The segmented object outlines therefore, may be different from the original object edge. Additionally, another disadvantage of this method is that an absolute background is obtained from video, that is, no moving objects, but such situation rarely happens in the real world.
The video object segmentation method based on time domain and spatial domain is the most popular segmentation algorithm, and utilizes segmentation region in spatial domain to assist segmentation defect to provide an accurate outline location. The most popular video object segmentation method based on time domain and spatial domain always comprises watershed techniques and K-Mean grouping techniques. Such algorithms always comprise: time domain, spatial domain, and the combination of time domain and spatial domain. The time domain further comprises three steps: all field motion estimation and compensation, environment change detection and core segmentation. The all field motion estimation and compensation is used for compensate the movement of camera, and the environment change detection is used for detecting if there is environment change in inputted video frames. These two methods are preprocessing steps for obtaining time domain. After that, the core segmentation of time domain is performed, and a change detecting method is used to determine if there is any pixel changing in the obtained motion information by utilizing probability estimation method. After that, spatial domain segmentation is performed, which utilizes image content to define the images as meaningful or non-meaningful. However, the regions defined by a computer are different from which defined by human observation. That is, some regions are defined as different through human eyes but are defined as similar with a computer.
The watershed technique classifies the pixels with similar gray levels as the same region. Of all methods of watershed technique, an immersion-based method is the most popular, which starts operation from the minimum value of the region, that is, the location with a minimum value of image gradient. It is imagined that a hole is formed, and the water rises from this hole, and pixels with similar gradient values are classified as the same region, and the region is enlarged until it reaches the maximum value of the image gradient value that is the highest point of the plane. Then, a dam is build to prevent the water of different regions. Finally, the information in time domain and spatial domain is merged to obtain final object region outline.
The watershed technique is sensitive to noise however, and the problem of over-segmentation may occur. Though most effects can be omitted via image processing, apparent grains may happen on face region, that is, the face regions should be determined as the region but still be divided). Thus, a region merging method is needed to solve the problem of over-segmenting. Such methods increase the loading of the system. However, the complexity of merging it to a real time system is also increased.
Additionally, K-Mean grouping technique is used to divide the region into k similar regions. After that, an AND operation is performed to the divided region and the corresponding change detection result region, and if the result of AND operation divided by the pixel numbers of the region is larger than predetermined threshold value, then the region is determined as a moving region. Afterwards, a region description operator is used to obtain a moving object. However, such methods do not utilize a threshold value adaptive to video, and thus the obtained moving region is not perfect. Since the detection result of most of the foreground object with static parts are ignored, the foreground object is barely obtained after the region dividing operation. In this case, though the complete object outline can be obtained from the region description operator, more computing time for comparing front and current frames is needed, which is hard for a real time system. Furthermore, it is also an important issue that the K-Mean grouping technique determines the number of classified groups.
The method based on change detection is used for detecting the variance between two nearby frames. The most popular detection method is to subtract nearby frames, to thereby obtain the difference indicating the variance level of the object in nearby frames to be a reference for changing of nearby frames.
After change detecting, the result of object masking is utilized next to obtain a complete object outline. Since the difference of frames is directly utilized to determine moving information, this method has low resistance to outside interference such as light change shadow or noise, but has a low computing amount. Additionally, such methods may lose useable motion information if the moving object suddenly stops or moves slowly after moving for a period of time. Some inventions disclose how to solve this problem, but still cannot solve the problems of changing brightness and shadow effect, and the uncovered background of which may incorrectly determine the background region as a foreground region. One of the prior art methods utilizes the motion estimation method to determine if the displacement vector in the change region at time k corresponds to the change region at time k+1. If it does, the region is determined as a foreground object; otherwise it is a background region. Such a method cannot have a high accuracy of edges due to the use of motion estimation. Otherwise, the method can only process translation type objects, and may wrongly determine the movement of other types, thus the computing complexity of the system may increase.
Finally, the video object segmentation method, which interacts with the user, allows a user to circle the object outline of the object to be segmented first. Then, it tracks the circled region in following frames and updates the relative information, and combines the image content to obtain object region. Such method may have finer objects outlined, but may incorrectly update outlines if the obtained characteristics do not have high relative characteristics. Thus efficiency is decreased, and the application of such methods is limited.
As described-above, each method presents different disadvantages. A new invention is therefore needed to solve above mentioned problems.