1. Field of the Invention
The present invention relates to a video object segmentation method, especially a video object segmentation method applied for rainy situations.
2. Description of the Prior Art
In recent years, development of computer- and communications-related technologies has become more holistic and abundant, with digital multimedia having a variety of humanized interface functions becoming one weight-bearing technology under development. This is due to its ability to pass focused information with great significance to users directly. Thus, many related studies are being developed and digital multimedia is deeply affecting human lives. Digital multimedia technology not only includes music and images, which seen frequently, but also integrates other media, such as text, sound, video and graphs. In “3C” (computer, communication, and consumer electronics) integrated products, multimedia technology is ubiquitous. For example, MP3 technology, mobile phones with video recording functionality, and electronic clothing with futuristic style, are all gathering diverse multimedia technologies into one product. In a diverse, multi-functional multimedia field, video technology is one of the primary technologies under study. Compared with a single, static image, a dynamic analysis process of video technology not only comprises technology for processing the single, static image but also comprises analysis of change with time. In other words, if video could be seen as contiguous single images changing with time, and video information included increases drastically as time passes, then there exist certain difficulties in video multimedia technology and processing of large amounts of information that the video multimedia technology needs.
Modern video multimedia technology has advanced from a low-level analysis process, or description and analysis of luminance, color and texture, to a high-level characteristic description studying the meaning and concept of the multimedia content, e.g. studying the concept of the abstract relationship between objects and events. Advancements in video technology make it possible that users can interact more directly, so as to help comprehend each other. The above concept of interaction can be seen from the video compression standards of MPEG-4 and later-developed MPEG-7. The future development of video technology takes objects as elementary operating elements, utilizes the meaningful object in video to realize object-oriented compression technology, and then realizes description and interaction of the object. In order to perform the above-mentioned functions, video object segmentation is the critical technology. If the video object can be segmented precisely, then the compression rate of the compression algorithm will be increased. For processing related to the video content, e.g. searching for similar video objects on the Internet or a database, accuracy of the searching and speed of the searching both depend on technology used for segmenting the video objects. In addition, the video object segmentation is a process that precedes application of most computer visual discrimination. Thus, segmenting the object from the video images occurs first, then tracing, counting, discriminating, and analyzing the actions of the object can be performed.
Video object segmentation is different from static image segmentation, because video object segmentation considers not only spatial content change, but also information on temporal change. Therefore, the video object segmentation provides information about movement of the object, which is not included in the static image segmentation. And, because of the added object movement information, the video object segmentation is more practicable than the static image segmentation. Using the movement information offered by the change of the object along the time axis, in conjunction with the assistance of low-level characteristics of the object, the accuracy of the video object segmentation can be increased. But, variable factors in the active video environment cause other problems in video processing. For example, raindrops can be erroneously determined as the moving object due to the variable background in a rainy environment. The raindrops can also influence the accuracy of segmenting the moving object.
Normally, the video object segmentation methods can be classified into 2 categories: automatic video object segmentation and semi-automatic video object segmentation. Automatic video object segmentation methods, also called unsupervised formula video object segmentation methods, segment the video object automatically using a computer. Semi-automatic video object segmentation methods, also called supervised formula video object segmentation methods, define the segmentation object region prior to segmenting, and then segment following frames according to the characteristics of the defined object region. Generally speaking, automatic video object segmentation algorithms can be classified as video object segmentation methods based on object edge information, video object segmentation methods based on the time and spatial domain, and video object segmentation methods based on change detection. However, video object segmentation methods based on interaction with the user belong to semi-automatic video object segmentation algorithms.
The segmentation algorithm based on object edges mainly utilizes a Canny edge detection operator to get the information about the object shape. First, the difference between two adjacent frames is calculated, and the Canny edge detection operator is used to produce a double object edge map. Afterwards, the object edges in the present frame are taken out, and the object shape is derived from the combination of the moving and static edges. A more complete shape of the object is then derived through the intersection of the vertical and horizontal detections, and a post-process morphology operation. However, in the segmentation algorithm, because the object is segmented with the Canny operator, and the Canny operator decides the edge reliability by a gradient, when the texture of the object region and the texture of the background region are similar, or when the object edges are fogged by the influence of raindrops, the gradient of the difference is hard to obtain, such that the object edge can be lost, and the error rate for determining moving and static edges will increase, further causing an obvious difference between the original object shape and the segmented object shape.
The video object segmentation method combined with the information about the time and spatial domains is presently the most popular video object segmentation algorithm. It utilizes the segmentation region in the spatial domain to make up for segmentation weakness in the time domain so as to offer a more precise position of the object shape. Commonly used spatial segmentation methods include watershed technology, K-mean technology, etc. The main three parts of the algorithm are: time domain, spatial domain, and a combination of the time and spatial domains. The time domain includes three steps: estimation and compensation for shift in the whole moving region, detection of scene change and the core segmentation. The whole moving region estimation and compensation overcomes the dynamic motion of the camera, and the scene change detection detects whether there is a scene change in the input video frames. Both can be seen as pre-processing in the time domain. Subsequently, core segmentation is performed, the motion information is derived from the change detection, and the possibility estimation method is utilized to determine whether the pixels change or not. Subsequently, in the spatial domain, the spatial segmentation of the algorithm is performed. The spatial segmentation of the algorithm mainly utilizes the low image content, classifying the meaningful regions of the image. However, the segmented region still differs from what would be recognized by human visual perception, because some regions that should be segmented to different regions according to the human eye are still segmented to the same region by the computer.
The watershed segmentation method mainly classifies pixels of the image having similar gray scale to the same region, and in the watershed segmentation algorithm, the immersion-based method is the most often used. The immersion-based method begins with a position of a minimum value of the region, which is also the position of the minimum value of the gradient image. Imagine digging a hole, and water flooding from the hole, the water level becoming higher and higher. This example is analogous to classifying the pixels of similar gradient value to the same region, and then gradually broadening the region until the maximum value of the image gradient, which is the highest point of the plane, is reached. Afterwards, a dam is built, with the main purpose being to prevent water in different regions from flooding each other. Finally, the derived information of the time and spatial domain are combined to obtain the final shape of the object region.
However, the watershed segmentation method is sensitive to noise and easily influenced by noise, especially in a rainy environment, where the raindrops become an abundant source of noise. Too much noise leads to excessive segmentation. For example, the whole face region of a man should be determined as the same region, but because of the influence of the raindrops noise, the whole face region will be segmented into many small regions, therefore requiring performance of a region-merging algorithm afterward to solve the problem of the excessive segmentation. But such an algorithm increases the load of the system, and also increases the difficulty of integration into a real time system.
The K-Mean cluster technology divides the region into K similar regions. An AND operation is performed on each K-Mean cluster region and the corresponding result region derived from the change detection. If the result of the AND operation dividing the pixel number of the region is greater than a default threshold value, then the region is set as the moving region. Then, the region description operator is utilized to obtain the moving object. The shortcoming of this method is that the threshold value is not taken from the most suitable video content, so the derived moving region is not perfect, because most of the object in the foreground belongs to the still region. Therefore, after change detection, the result will be omitted. In contrast, the drastic variation of the raindrops will be determined as the object part in the foreground erroneously. Therefore after the operation of the region segmentation, the object in the foreground is hard to take out, and the raindrops gathering region is determined erroneously as the object in the foreground, which is not desirable. Although the unwanted object in the foreground can be eliminated, and the complete object shape can be derived by the region description operator later on, it takes much time to execute the comparison and calculation between the current and previous frames. When applied to the real time system, the prior art faces great difficulty, and furthermore how the number of classification of the K-Mean cluster technology should be decided is also a very important issue for video segmentation.
Regarding the algorithm based on change detection, its purpose is to detect the change between adjacent frames, and the most direct change detection method is performing subtraction between two adjacent frames. The derived difference represents the change degree of the object between two adjacent frames and is used as the change reference between two adjacent frames. After change detection, the derived result of the object mask should be processed further in order to obtain the complete object shape. Because the difference between frames is utilized to decide the motion information, resistance to external interference, such as light change, shadow, or noise is low, it cannot be applied to rainy circumstances, in which many raindrops are included, but the algorithm based on this method requires a relatively lower quantity of calculations.
When the moving object has moved for a period of time, if the moving object suddenly stops, or moves very slowly, based on the change detection method, part of the motion information is lost. Some inventions in the prior art raise solutions to address this problem, but are not capable of solving the problems of light change and shadow effect. For example, the amount of rain leads to the light change on the image shot in the rainy environment directly, and the dynamic reflection effect on the rainwater gathered on the ground is another problem. Furthermore the uncover background problem exists inherently. All of these problems lead to the erroneous determinations. The conventional technology utilizes the motion estimation method to determine whether the displacement vector in the change region at time k corresponds to the change region at time k+1. If so, then the region is determined as the object in the foreground. Otherwise, the region is determined as the background region. But, because the method utilizes the motion estimation, the accuracy of the edges is low. Furthermore, this method can only process translation-type objects. Other variations, e.g. rotation, may be determined erroneously, therefore increasing the complexity of the calculation in the system.
Finally, the video object segmentation method based on interaction with users allows the user aiming at the object intended for segmentation to define a bounding box on the shape of the object by hand first, and afterwards in the following frames, according to the characteristic of the bounding box region, perform the contiguous tracing and updating, incorporating the low-level characteristic to obtain the object region. This kind of segmentation method obtains more accurate object edges, but when the relationship between the derived characteristics is not close, the update of the shape leads to a wrong determination and lowers the accuracy. For example, in the rainy environment, because the raindrops fog the object edges and cause a high error rate in the update of the shape, the development of this method is limited.
From the above, each conventional algorithm has its own weaknesses, and the rainy environment influences and seriously lowers the accuracy of the video analysis and segmentation.