The present invention relates to the field of video processing and, in particular, to systems for extracting motion information from an encoded video sequence, including traffic monitoring applications.
Motion information can be of great importance in a number of applications, including traffic monitoring, tracking people, security and surveillance. For example, with the increasing number of vehicles on the road, many cities now face significant problems with traffic congestion. Tackling the problem has become a high priority task for many countries. This is an example of an application for motion information extraction. The application of image processing and video sensing in motion information extraction has evolved rapidly in recent years, from the stage of feasibility studies to that of operational uses. In such systems, monitoring cameras and sky cameras have been placed along roadways to enable relevant authorities and the general publics to obtain motion information about the flow of traffic. In this regard, accurate and up-to-date motion information is a key requirement among all these users.
With the large amount of data, some kind of intelligent system is required to sift through the information to present the user with concise, useful information. On the other hand, broadband communication and video transmission has been gaining ground very quickly in recent years. It can therefore be anticipated that such a communication channel will be a key component through which motion information can be disseminated. As such, any tools that are able to extract useful information from such channel will have a potential large marketplace.
Compared with other motion detection approaches such as the inductive loop or virtual loop methods, motion detection from videos offers a flexible alternative and is therefore becoming widely used. Video cameras for motion detection can be easily added at any place and any time at a comparably low cost. The cameras can be used to provide data of long viewing stretches. However, in video motion detection, the main problems encountered are related to the development of robust processing tools and to the consequent high computational complexity. Real-time processing of video sequences represents a fundamental issue so that computationally efficient motion extraction applications, such as traffic monitoring and surveillance system, can be put into use. Visual information, acquired by cameras and digitized at known frame rates by dedicated boards, is usually characterized by high dimensionality, spatial and temporal complexity, as well as noise sensitivity.
Currently, commonly used approaches to video motion detection include optical flow and object tracking. The optical flow method attempts to study object motion by providing an estimation of the optical flow field in terms of spatio-temporal image intensity gradients, which are calculated at every pixel of the image subject to some form of global smoothness constraints. Therefore, the optical flow method is computationally intensive, which makes on-line and real-time traffic monitoring from video sequences difficult. In addition, this method causes inaccurate motion estimation at the occlusion boundaries.
The object tracking method attempts to trace the movement of the objects from frame to frame in a video sequence. Although single or few objects can be tracked adequately using existing image processing techniques, multiple objects tracking in complex environment remains an unsolved issue. For example, there may be many vehicles on the roads under high-density traffic situations. Thus, for a vehicle in a frame, finding its corresponding one in the reference frame can be difficult, and the computational cost can be extremely high. In addition, the segmentation of multiple overlapping moving objects in low-resolution images remains an ill-posed problem.
A number of systems for block motion estimation and motion vector calculation have been proposed.
U.S. Pat. No. 5,864,372 describes an apparatus for implementing block matching for motion estimation in video image processing. The apparatus receives pixel data of an original image block and pixel data of a compared image block selected from a number of compared image blocks during video image processing. The selected image blocks are compared to determine a movement vector. The apparatus has a multi-stage pipelined tree-architecture that includes four stages. The first computational stage produces corresponding pairs of difference data and sign data. A second compression stage in the process pipeline includes a compression array that receives all the difference data and sign data, which are added together to produce compressed summation data and compressed sign data. The third summation stage in the pipeline receives compressed summation and sign data and produces a mean absolute error for the original and compared image block pixels. A last minimization stage receives the mean absolute error for each of the compared image blocks and determines a minimum mean absolute error from among them. The compression array includes of a number of full and half adders arranged in a multi-level configuration in which none of the adder operand inputs and the carry-in inputs is left unconnected. However, it is just an apparatus of block motion estimation and does not extract any motion information.
U.S. Pat. No. 5,872,604 describes a method of detecting motion vectors that detects motion based upon calculation of picture data of a reference block and upon picture data in a search block. The search block is located within a search area and then the search area is variably set. An apparatus for detecting motion vectors includes a motion detection circuit for detection motion based upon calculation of picture data of a reference block and upon picture data in a search block, located within a search area, and a circuit for variably setting the search area. However, only a method and apparatus for block motion vector calculation are described, and it does not extract any motion information.
U.S. Pat. No. 5,793,985 describes a method of block-based motion estimation used in video compression. The compression process derives change data for a new frame of data (with respect to a reference frame) by first dividing the frame structure into data tiles (or data blocks) of identical size. Each tile in the new frame is compared to a localized window (about the tile""s expected position) in the reference frame to search for a best fit, and thereby provide motion data for the particular tile. Once the best fit is determined, motion-compensated difference data is determined and stored with the motion data for each tile to complete the process. To achieve computation efficiency, each tile under analysis is preferably converted to single-bit value data and searching and comparisons are performed based on such transformed single-bit data. The single bit data is computed by convolving the original image data with a low-pass filter to obtain a threshold matrix. The original image data is then compared with the threshold matrix and converted to single-bit values in dependence on whether the values of the data exceed counterparts in the threshold matrix. Comparison is performed using an exclusivexe2x80x94or function and bitxe2x80x94summation of results. However, the patent only describes a block motion vector generating method on low-bit images, and not a method of motion information extraction.
U.S. Pat. No. 5,742,710 describes a block-matching method for generating motion vectors. The method performs block matching on successively higher resolution images by refining motion vectors determined in a lower resolution image. At respective higher resolution images, search areas of limited search range are defined via a motion vector associated with corresponding image areas in the immediately lower resolution search. For at least one level of image resolution, the search blocks are overlapped to provide a plurality of search areas of limited search range for performing block matching searches for each block in the next higher resolution level. Again, this method presents a way of block motion vector calculation, and does not perform any motion information detection.
All of the foregoing systems have placed a heavy emphasis on the approaches of obtaining block motion vectors, but have not exploited the advantages of using motion vectors directly from encoded video sequences for motion information extraction. Thus, a need clearly exists for such a system with reduced complexity and computational cost for applications such as traffic monitoring.
In accordance with a first aspect of the invention, there is disclosed a method of extracting motion information from an encoded video stream containing interframe motion vectors under fixed camera settings and a well defined environment. The method includes the steps of: separating motion vectors obtained from the encoded video stream; filtering the motion vectors based on predetermined environmental knowledge; and determining predetermined parameters based on the filtered motion vectors. The determining step includes the step of calculating the motion information using motion vector analysis on the filtered motion vectors.
Preferably, the filtering step includes the sub-step of eliminating any motion vectors that: do not coincide with a road direction, intersect with other motion vectors, cross a road border, or do not have appropriate amplitude or size. Still further, the predetermined parameters include speed, density and flow and may be provided at regular time intervals.
Preferably, the encoded video stream is obtained from a sky camera. The video stream is a motion vector presentation of compressed video. More preferably, the compressed video has a format selected from the group of formats consisting of MPEG and H.26x.
The method may include one or more of the steps of detecting speed based on an amplitude calculation of the filtered motion vectors, detecting density based on occupancy computation of microblocks with nonzero motion vectors, and estimating flow based on a combination of speed and density detection. The speed is detected based on an amplitude calculation of the filtered motion vectors and the density is detected by an occupancy computation of microblocks with nonzero motion vectors.
Preferably, the interframe motion vectors are generated using electronic encoding hardware.
Preferably, the filtering step includes at least one of the sub-steps of: eliminating any motion vectors that do not coincide with a predetermined direction; eliminating any motion vectors that intersect with other motion vectors; eliminating any motion vectors that cross a predetermined border; and eliminating any motion vectors that do not have an appropriate amplitude or size.
More preferably, the method involves monitoring traffic, where the encoded video stream is an encoded traffic video stream obtained from a sky camera in a well defined traffic environment using predetermined traffic knowledge and involving the determination of predetermined traffic parameters.
In accordance with a second aspect of the invention, there is disclosed an apparatus for extracting motion information from an encoded video stream containing interframe motion vectors under fixed camera settings and a well defined environment. The apparatus includes: a device for separating motion vectors obtained from the encoded video stream; a device for filtering the motion vectors based on predetermined environmental knowledge; and a device for determining predetermined parameters based on the filtered motion vectors.
In accordance with a third aspect of the invention, there is disclosed a computer program product having a computer readable medium having a computer program recorded therein for extracting motion information from an encoded video stream containing interframe motion vectors under fixed camera settings and a well defined environment. The computer program product includes: a module for separating motion vectors obtained from the encoded video stream; a module for filtering the motion vectors based on predetermined environmental knowledge; and a module for determining predetermined parameters based on the filtered motion vectors.