This invention relates to the field of video systems, and in particular to video input devices that include processes for identifying motion in a video image that is relevant to a given video processing task.
The application of video image processing to varieties of tasks continues to grow. Such applications include, for example, video surveillance, inventory control, traffic management, and so on.
FIG. 1 illustrates a typical image-based system. A video processor 150 receives video image data from a variety of image sources 110, 120. The image source may be, for example, a video camera 110 that provides ‘live’ images, or a video recorder 120 that provides previously captured images. The sources 110, 120 may be connected directly to the processor 150, or via a network 130, or a combination of both.
The video processor 150 provides image information to an application/task 170 that performs a given function based on the image information. The application 170 may be, for example, a surveillance system that processes the image information to identify situations that warrant an alarm; or, it may be an application that counts people or objects that enter and exit portals; or, it may be a ‘passive’ system that merely stores select images for subsequent retrieval. Generally speaking, the application 170 identifies ‘events’ based on image information, and initiates select action based on these events.
As the complexity of video monitoring systems increases, techniques have been developed to facilitate the efficient transmission of video images. U.S. Pat. No. 5,602,585, “METHOD AND SYSTEM FOR CAMERA WITH MOTION DETECTION”, issued 11 Feb. 1997 to Dickinson et al., and incorporated by reference herein, teaches the use of a motion detector within a camera to selectively couple image data to a video processor, specifically, a video recorder. The camera is initially placed in a differential mode, wherein changes to the image are provided to the motion detector; when the amount of changes/motion exceeds a given threshold, the camera is placed in full-video mode, wherein the camera is coupled to the recorder, and full images are provided from the camera. After a predetermined duration, the camera is again placed in the differential mode, and decoupled from the recorder.
FIG. 1B illustrates a block diagram of a Dickinson-like technique in the context of this invention. The camera 110 includes a video capture component 112 that sequentially captures images, and a motion detector 116 that determines whether the amount of change/motion in the sequence exceeds a given threshold. The motion detector controls a switch 114 that selectively couples the video images from the video capture component 112 to the output of the camera, based on whether the threshold is exceeded. In this manner, only images that exhibit at least a minimum amount of change/motion are communicated to the video processor 150. This technique is particularly effective for minimizing traffic on a limited bandwidth video network that may be coupled to a plurality of video sources, such as illustrated in FIG. 1A. That is, if each of the cameras 110 and the DVR 120 of FIG. 1A are configured to only transmit changes that exceed a given threshold to the processor 150, the bandwidth of the channel used to route the video from each source 110, 120 to the video processor can be substantially reduced, compared to a continuous video stream from each of the sources 110, 120.
As digital processing techniques advance, the need for a Dickinson-like technique to minimize bandwidth requirements is diminished, as illustrated in FIG. 1C. In this example system, the camera 110 includes an MPEG encoder 118, so that the output stream from the camera 110 is an MPEG-encoded stream. As is known in the art, the MPEG format is inherently a differential format, wherein only the changes to regularly communicated reference images are transmitted. As such, if there is no change between images, no additional ‘change frames’ need be communicated. Further, the bandwidth used to communicate each change-frame will be dependent upon the amount of change. That is, minor changes consume minor amounts of bandwidth, whereas fuller or more complex changes consume substantially more bandwidth. One of ordinary skill in the art will recognize that Dickinson's threshold-based gating could also be applied to the system of FIG. 1C, although the relative increase in efficiency, compared to its application to full-stream video would be substantially decreased.
Returning to FIG. 1A, as video monitoring systems increase in complexity, the ‘scalability’ of the video processor 150 and video application 170 becomes a limiting factor in the expansion of the video monitoring capabilities to include multiple video sources 110, 120. Even with the use of motion-only filtering, as illustrated in FIG. 1B, or differential imaging, as illustrated in FIG. 1C, the video processor 150 and/or the application 170 are still required to process each frame from each source 110, 120 that reports motion, and, in the case of FIG. 1C, this processing necessarily includes decoding the received MPEG frames to produce the image frames.
A further problem with the motion-based filtering approaches of FIGS. 1B and 1C relates to the indiscriminate nature of motion-detection. In an outdoor scene, for example, the random movement of leaves and branches of a tree can produce a measure of perceived motion that equals or exceeds the measure of motion of a person entering or leaving a scene. In an indoor scene, movements in ‘permitted’ areas, such as the area near bank tellers produce a measure of motion that is indistinguishable from a measure of motion produced by movements in ‘protected’ areas, such as the area near the bank's safe. That is, conventional motion-based filtering techniques are fairly ineffective in environments that are expected to exhibit movements that are irrelevant to the task at hand, and are generally only effective in limited environments, such as systems that monitor the interior of bank safes, or office or factory environments during ‘off-hours’, and so on.
An object of this invention is to provide a video monitoring system that is well structured for multiple-camera operations. A further object of this invention is to provide a video monitoring system that is well suited for environments that exhibit activity/motion that is generally unrelated to the video monitoring application. A further object of this invention is to provide a video monitoring system that reduces the amount of video processing or video analysis required to perform a given task. A further object of this invention is to further reduce the bandwidth requirements for video monitoring systems.
These objects, and others, are achieved by distributing the video processing typically performed in a video monitoring system among the components of the system. Specifically, the filtering tasks that are conventionally applied in a video monitoring application, to identify activity in the images that may be relevant to the monitoring task, are distributed to the image source, or near-source devices. Source devices, such as cameras and playback devices, and near-source devices, such as video concentrators and streaming devices, are configured to include video processing tools that can be used to pre-filter the image data to identify frames or segments of frames that include information that is likely to be relevant to the receiving video monitoring application. In this manner, the receiving processor need not spend time and resources processing images that are pre-determined to be irrelevant to the receiving application.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.