Object detection and tracking is an important computer vision technology, and plays a basic and key role in video surveillance. Through correct object detection and tracking, the correct object profile and object trajectories may be obtained. The information is the basis for high level video identification, and if the information is incorrect, the stability and the accuracy of the high level video surveillance system will be in doubt.
The rapid progress of the computer vision technologies and the emphasis of the public on the personal safety and property security facilitate the development of intelligent video surveillance and related management services. A stable object detection and tracking system is becoming an ever important part in assisting the administrators to achieve the capability in remote and long time surveillance with minimum man power.
U.S. Patent Publication No. 2005/0286738A1 disclosed the graphical object models for detection and tracking. The graphical object models disclosed an architecture for spatio-temporal object detection systems. Based on the characteristics of the target object, such as pedestrian or cars, the system decomposes the object into a plurality of smaller objects, such as hand, foot, torso, and so on. Through the combined model of the smaller objects, the system determines the location and the size of specific objects appearing in a single image. The detection information will be passed to the next image so that the architecture of the spatio-temporal object detection system is formed. In the disclosed document, the combined model of smaller objects must learn the model of specific objects in advance and only keeps tracking of specific objects.
U.S. Patent publication No. 2005/0002572A1 disclosed the methods and systems for detecting objects of interest in spatio-temporal signals. As shown in FIG. 1, sensor 101 collects spatio-temporal data 101a. Spatio-temporal data 101a, after color conversion and quantization, is passed through a foreground/background separation module 104 of object detection and tracking architecture 100, and the spatio-temporal signals are separated and foreground/background classification labeling is performed on each location point. Then, through a spatial grouping module 105 to group the neighboring foreground into the same object. A temporal tracker 103 is used to track the movement of the foreground objects along the time line. Then, an object classification module 106 classifies the foreground objects into different semantic objects, such as cars, pedestrians, and so on. The classification information is fed to foreground/background separation module 104 to modify the model parameters so that the foreground and background areas and objects classified by the new foreground model and the background model parameters will match. This document did not disclose the information sharing between object tracking and object detection modules, and provided no disclosure on the whole object shape labeling information in foreground/background detection.
U.S. Patent Publication No. 2006/0045335A1 disclosed the background maintenance of an image sequence on multiple spatial scales. The multiple image scales include a pixel scale, a regional scale, and a frame scale. The computation of the pixel scale is through background subtraction to obtain the initial foreground divided regions. The computation of the regional scale is through the regional combining of the neighboring foreground pixels to obtain the overall foreground objects. The computation of frame scale is through the determination of the regions requiring update in the background model to refine the current background model. The document disclosed the concept of using background model to detect objects.
U.S. Patent Publication No. 2005/0104964A1 disclosed a method and apparatus for background segmentation based on motion localization. The document disclosed a method of combining image subtraction and background construction to construct the detection system, and is able to target at the unexpected small camera motion to perform motion compensation and detection. The image subtraction may detect the rough boundary of a moving object, and the clustering is used to find the object blocks. The rest image blocks are determined as background and are used for updating and constructing the background model. The camera movement is determined by continuously observing whether the background features move. This technique directly estimates the random camera movement for motion compensation; therefore, it is very time-consuming for complete pixel movement.
In the aforementioned techniques or other prior arts, when a new image is input, through the observed new image and the previous foreground model and background model, the following equation may be used to compute the observation probability of each image point being a foreground F or background B:
                              p          ⁡                      (                          x              ❘                              Ω                B                                      )                          =                              1                          N              B                                ⁢                                    ∑                              Bi                =                0                            n                        ⁢                          φ              ⁡                              (                                  x                  -                                      y                    Bi                                                  )                                                                        (                              eq            .                                                  ⁢            1                    ⁢          a                )                                          P          ⁡                      (                          x              ❘                              Ω                F                                      )                          =                              α            ×                          1              U                                +                                    (                              1                -                α                            )                        ×                          1                              N                F                                      ⁢                                          ∑                                  Fi                  =                  0                                m                            ⁢                              φ                ⁡                                  (                                      x                    -                                          y                      Fi                                                        )                                                                                        (                              eq            .                                                  ⁢            1                    ⁢          b                )            where NB and NF are the positive programming parameters of probability, and φ( ) is a kernel density function. yBi, yFi belong to the training observation points of the background model and foreground model, respectively, and are located at background model storage sequence and foreground model storage sequence. ΩB and ΩF are the classification of the background and the foreground. U−1 is a fixed parameter, representing the probability of uniform distribution. n represents the number of all the background training points in the background model image sequence, and m represents the number of all the foreground training points in the foreground model image sequence. α is a weight parameter.
Through viewing an unknown location in an image and the unknown color value of the unknown location as a multi-dimensional random vector X, the probability model may take feature space and the space domain variance into account simultaneously. The more the observed data, the more reliable the estimated model distribution.
As the above equation shows, the determination rule of a simple labeled image point may include the comparison of the probabilities of foreground F and background B in the above equation, and then executing a hard decision. However, this approach will cause inaccurate detection. For example, when the appearance features of an object are similar to the appearance features of the foreground, the simple observation probability will cause confusion. Obviously, the foreground probability and the background probability are almost equal in this case. If only the current observed probability is used in determination, such as Gaussian Mixture Model (GMM) for object detection, there will be obvious detection error, and the accurate classification of the object will be impossible.
Many of the past technologies in object detection and tracking are built upon the enhancement of the low level video signal analysis. The more stable the low level video signal analysis, the more accurate the high level video signal analysis. However, the conventional object detection and tracking technologies usually focus on enhancing and improving a single technique, for example, object detection focusing on maintaining background information, or object tracking focusing on maintaining tracked object information. Most conventional technologies do not emphasize the overall video analysis; i.e., that the object detection and object tracking should share the information. In actual applications, the light change, climatic change and dynamic background will all affect the accuracy of video surveillance and the effectiveness of intelligent video surveillance. The low level information should provide the basis for high level decision, and the high level information should feedback to improve the reliability of low level processing.