Video-based vehicle speed estimation is an area in which significant effort is being expended in order to develop a robust video-based speed law enforcement solution. High spatial and temporal resolutions are required to yield accurate speed estimates and to perform other traffic enforcement tasks of interest such as automatic license plate recognition (ALPR). For example, one conventional video-based speed enforcement solution uses a 4 Mpixel, 60 fps camera. Achieving real-time processing at this high data rate is extremely challenging and increases the complexity associated with screening of candidate speeders, as well as increases the processing burden encountered in off-line video analysis. Using conventional approaches, video compression may be encountered in at least two scenarios. In one, you may have access to video data as it is being compressed in the conventional manner, such as having access to the compression methods in use in a smart camera. In another scenario, you may be the receiver of video data that has been previously compressed. Conventional approaches do not accommodate computation within the compression operation or using features of the compressed data stream to screen candidates speeding vehicles.
A typical video-based vehicle speed enforcement solution has two main components: a speed estimation module and a vehicle identification module (usually an ALPR mode). Conventional approaches are faced with a significant computational burden when attempting to provide accurate speed estimation for violators in addition to accurate ALPR and near-real-time processing performance. All three requirements are interdependent and difficult to satisfy using conventional approaches. For example, speed estimation generally involves detecting a vehicle, tracking the detected vehicle's feature(s), e.g., corners of its license plate or tire edges, converting the tracked trajectory (e.g., in image pixel coordinates, etc.) to real-world coordinates (e.g., in meters, feet, etc.) through camera calibration, and estimating the vehicle's speed by computing the ratio of travelled distance to time. These steps require significant computational resources, even when processing video acquired with traditional, low-resolution, low frame rate surveillance cameras. Moreover, in order to yield accurate speed estimates and achieve the required ALPR performance, high spatial and temporal resolutions are required.
Achieving real-time processing under the high data rate constraints imposed by the performance requirements (e.g., accurate speed measurement and successful ALPR) is extremely challenging under conventional approaches. Achieving near real-time processing for speed enforcement makes desirable a system capable of identifying most (if not all) speed violators while achieving accurate speed estimation.
Video compression is employed in applications where high quality video transmission and/or archival is required. For example, a surveillance system typically includes a set of cameras that relay video data to a central processing and archival facility. While the communication network used to transport the video stream between the cameras and the central facility may be built on top of proprietary technology, traffic management centers have recently started to migrate to Internet Protocol- or IP-compliant networks. In either case, the underlying communication network typically has bandwidth constraints which dictate the use of video compression techniques on the camera end, prior to transmission. In the case of legacy analog cameras, compression is performed at an external encoder attached to the camera, whereas digital or IP cameras typically integrate the encoder within the camera itself. Typical transmission rates over IP networks require the frame rate of multi-megapixel video streams to be limited to fewer than 5 frames per second (fps). The latest video compression standards enable the utilization of the full frame rate camera capabilities for transmitting high definition video at the same network bandwidth. For example, transmission of 1080p HD uncompressed video requires a bandwidth of 1.5 Gbps, while its compressed counterpart requires only 250 Mbps; consequently, transmission of compressed video with at least 6 times the frame rate of the uncompressed version would be possible over the same network infrastructure.
Video compression is achieved by exploiting two types of redundancies within the video stream: spatial redundancies amongst neighboring pixels within a frame, and temporal redundancies between adjacent frames. This modus operandi gives raise to two different types of prediction, namely intra-frame and inter-frame prediction, which in turn result in two different types of encoded frames, reference and non-reference frames. Reference frames, or “I-frames” are encoded in a standalone manner (intra-frame) using compression methods similar to those used to compress digital images. Compression of non-reference frames (e.g., P-frames and B-frames) entails using inter-frame or motion-compensated prediction methods where the target frame is estimated or predicted from previously encoded frames in a process that typically entails three steps: (i) motion estimation, where motion vectors are estimated using previously encoded frames. The target frame is segmented into pixel blocks called target blocks, and an estimated or predicted frame is built by stitching together the blocks from previously encoded frames that best match the target blocks. Motion vectors describe the relative displacement between the location of the original blocks in the reference frames and their location in the predicted frame. While motion compensation of P-frames relies only on previous frames, previous and future frames are typically used to predict B-frames; (ii) residual calculation, where the error between the predicted and target frame is calculated; and (iii) compression, where the error residual and the extracted motion vectors are compressed and stored. Throughout the teachings herein, we use the terms “motion vector” and “compression-type motion vector” synonymously.
For video captured with a stationary camera (the category under which most traffic cameras currently deployed fall), the main cause of changes between adjacent frames corresponds to object motion. In this setting the output from the motion compensation stage is the block matching algorithm describing the way pixel blocks move between adjacent frames. As such, the encoded set of motion vectors is a good descriptor of apparent motion of objects within the field of view of the camera.
Since video compression is typically performed at the camera end prior to transmission over the network, real-time hardware implementations of popular algorithms such as H264 and MPEG4 are commonplace. Implementing speed estimation modules based on compression motion vector analysis would add a small amount of computation which is conducive to real-time performance, as already existing highly optimized hardware implementations are leveraged.
There is a need in the art for systems and methods that facilitate using video compression motion vector information to estimate speed, while overcoming the aforementioned deficiencies.