During recent years, video image detection systems have been proposed in various applications for identifying and tracking moving objects. In particular, wireless video surveillance which uses automatic detection to track a moving object has been a key technology in the management of intelligent surveillance systems. Within the field of traffic management as an example, video image detection techniques have been deployed in intelligent transportation systems (ITS) for the purpose of optimizing traffic flow. By accurately distinguishing vehicles from background objects, an intelligent transportation system may obtain the current traffic volume along a road or even detect and track a particular vehicle.
The conventional moving object detection methods may be classified into three main approaches: temporal differencing, optical flow, and background subtraction.
In the temporal differencing, the regions of motion may be detected based on pixel-wise differences between successive frames in a video stream. Such technique may be adaptive to dynamic scene changes, and yet it has a tendency to incompletely extract the shapes of moving objects, particularly when the objects are motionless.
The optical flow technique may estimate the flow vectors of moving objects based on partial derivatives with respect to temporal and spatial coordinates from brightness values between successive frames in a video stream. Unfortunately, such technique may be sensitive to noise and inefficient for traffic applications due to computational burden.
Background subtraction has been a commonly used technique on video surveillance and target recognitions. In the background subtraction technique, moving foreground objects are able to be segmented from stationary or dynamic background scenes by comparing the pixel differences between the current image and a reference background model of the previous image. The background subtraction technique has been the most satisfactory method for motion detection.
Many variations of the background subtraction method have been proposed to detect moving vehicles within video sequences in an ideal bandwidth network environment. An Σ-Δ filter technique has been used in the Sigma Difference Estimation (SDE) approach for estimating two orders of temporal statistics for each pixel in a sequence in accordance with a pixel-based decision framework. Unfortunately, using the SDE approach may be insufficient for complete object detection in certain complex environments. In an attempt to remedy this problem, the Multiple SDE (MSDE) approach which combines multiple Σ-Δ estimators to calculate a hybrid background model has been developed. Besides the Σ-Δ filter technique, the Gaussian Mixture Model (GMM) has been widely used for robustly modeling backgrounds. Each pixel value is modeled independently in one particular distribution. The subsequent distribution of each pixel is determined based on whether or not it belongs to the background. The Kernel Density Estimation (KDE) method builds a background histogram by aggregating a value set obtained from the recent past of the pixel. However, this creates considerable requirements for the corresponding samples as well as computational expenses. On the other hand, a simple background model is derived by the Simple Statistical Difference (SSD) method using the temporal average as the main criteria to accomplish the detection of moving vehicles. The Multiple Temporal difference (MTD) method retains several previous reference frames with which to calculate the differences between each frame. This, in turn, shrinks gaps within the moving objects.
Unfortunately, video communication over real-world networks with limited bandwidth may frequently suffer from network congestions or bandwidth instabilities. This may be especially problematic when transmitting video information over wireless video communication systems. When data traffic congestions occur in a communication network, most users could tolerate a streaming video with a reduced quality rather than a video which lags or stands still. Therefore, a rate control scheme has been introduced as an effective video-coding tool for controlling the bit rate of video streams. To allocate the available amount of network bandwidth and produce variable bit-rate video streams, a rate control scheme would be used with the assistance of using H.264/AVC as an effective implement for video coding. With suitable allocation of bit-rate video streams, video stream transmission becomes more amenable to systems. Variable bit-rate video streams could be produced which allows robust transmission in wireless communication systems.
Nonetheless, although the rate-control scheme may increase the efficiency of video stream transmission over networks with limited bandwidth, its tendency to continuously change the bit rate decreases the ease of detecting moving objects. Hence, the aforementioned state-of-the-art background subtraction methods in variable bit-rate video streams generally may not produce satisfactory detection results.
For example, FIGS. 1(a) and 1(b) show a same streaming video captured by a camera and transmitted over a wireless network. FIG. 1(a) is a frame numbered 11 and has a bit-rate of 1,000,000 pixels per second, and FIG. 1(b) is a frame numbered 207 and has a bit-rate of 20,000 pixels per second. FIG. 1(a) illustrates a pixel 101 of a tree on a road, and FIG. 1(b) illustrates the same pixel 102 of the subsequent frame of a moving vehicle and the tree along the road. FIG. 1(c) shows a comparison among data of the same pixel from which its intensity variations in luminance (Y) component as time progresses. In this scenario, after the bit-rate is switched from a high-quality signal to a low-quality signal, the pixel value fluctuation would often disappear and the pixel value indicating a moving object 103 such as a moving vehicle would often be misinterpreted as a background object by using a conventional background subtraction technique.
For another example, FIG. 2(a) is a frame numbered 725 and has a bit-rate of 20,000 pixels per second, and FIG. 2(b) is a frame numbered 1328 and has a bit-rate of 1,000,000 pixels per second. FIG. 2(a) illustrates a pixel 201 of a tree along a road, and FIG. 2(b) illustrates the same pixel 202 of the subsequent frame of the tree along the road. FIG. 2(c) illustrates a comparison among data of the same pixel from which its intensity variations in luminance (Y) component as time progresses. In this scenario, when the network bandwidth is sufficient, the rate control scheme would typically increase a low bit-rate video stream to high bit-rate video stream in order to match the available network bandwidth. The background pixel value fluctuation 203 would often be misinterpreted as a moving object under a conventional background subtraction technique.
In response to the aforementioned problem of misidentification resulted from fluctuating qualities of video stream transmission, a new scheme of moving object detection method is proposed in order to enhance the accuracy of image detection under the circumstance of having variation in bit-rate video streams.