1. Field of the Invention
The present invention relates to a method and apparatus for detecting motion in video.
2. Background Art
There are many situations in which a motion detector is used to trigger an event when motion is detected or not detected. Some applications involve turning on lights when someone enters a room, or turning off lights when there is no movement in a room. Other uses include security, car theft protection, alarms, automatic doors, and others. Current motion detection systems have a number of disadvantages, including cost, complexity, poor performance, and others.
In the prior art there are two approaches to motion detection: xe2x80x9cactivexe2x80x9d and xe2x80x9cpassive.xe2x80x9d Active techniques emit some form of energy (e.g. sound or electromagnetic radiation) and detect motion based on the returned signals. These techniques tend to require more power, to be more disruptive of the environment, and to be easy to detect and defeat. Passive techniques do not emit signals but instead passively observe the environment being monitored and react to observed motion. Video cameras are used in some passive motion detection techniques.
A number of techniques have been developed to detect motion within the field of view of a video camera. These techniques include analog and digital techniques. Analog techniques typically look at the analog video signal generated by a camera and detect motion by examining changes in the signal. Examples of simple prior art analog techniques include putting photocells on a television monitor and detecting changes in values, using one-shot timers to sample fixed locations in a video signal, and using various circuits to integrate the video signal. These simple techniques generate signals that can be compared against baseline values to detect changes in the video signal that presumably are caused by motion. Other prior art analog techniques filter or integrate the incoming video signal and look for gross changes in the signal""s characteristics to detect motion.
These analog approaches tend to be inexpensive, but provide poor results because they utilize adulterated and simplified versions of the video signal. The bulk of the information content of the signal is discarded. Working with a signal with so little information content, the best that can be achieved is a presumption that motion has occurred in the scene when the incoming signal changes in a particular way.
All of these prior art analog techniques tend to be imprecise in what they measure. Accordingly, they have inherent limitations as to their sensitivity to actual motion. They are also susceptible to false triggers.
Digital techniques tend to be better at reducing both false positive (detecting motion when there is none) and false negative (not detection motion when motion does exist) motion detection outputs. Digital approaches are able to accurately and repeatably associate a numerical value with a physical portion of the video camera""s field of view. This ability to accurately quantify the light coming from an area in space makes it possible to determine when motion occurs in the scene being observed more accurately than can be done using analog techniques.
Prior Art Digital Techniques
Digital motion detection techniques are used for two general types of applicationsxe2x80x94determining inter-video-frame motion so that signal processing can be applied to deal with video interlacing issues, and video-based monitoring for physical security purposes. Techniques developed for video interlace signal processing tend to be much more computationally intensive, and therefore costly, than techniques developed for video security monitoring. In addition, video interlace processing techniques are not suited for detecting small amounts of motion and therefore do not work well in security video applications. Because these two application areas have quite different requirements, the digital processing techniques developed for each are different in nature. For example, in the case of motion detection for the purpose of video monitoring of an area, the ability to successfully detect motion is the key objective. Exact information on which particular objects in the field of view have moved and by how much is of lesser significance. For video interlace processing, however, it is important to know which object has moved and by how much. An example of a video motion detection technique designed for: video interlace processing is disclosed in U.S. Pat. No. 4,851,904 issued to Miyazaki, et. al.
Image understanding techniques have been developed for use in video interlace processing. These image understanding techniques automatically segment a video image into regions of pixels that correspond to objects in a video camera""s field of view. The motion of these objects can then be detected and tracked. These techniques are computationally intensive and expensive. They can rarely be made to run in real-time. Accordingly, they typically cannot be used for digital video security applications.
One example of a prior art computationally intensive approach for detecting motion involves taking regions of pixels (typically an Nxc3x97M rectangle) from an incoming video stream and correlating them with corresponding regions of pixels in a reference image. This approach can be thought of as an approximation of the generalized image understanding approach described above. The incoming image is divided up into rectangles. These rectangles are compared against corresponding rectangles of a reference image. Dividing an image into rectangles in this manner and comparing rectangles is considerably simpler than trying to identify individual objects in an incoming image and attempting to compare the location of those objects with the location of corresponding objects in the reference image. This technique is used as part of the MPEG video compression standard and is known as xe2x80x9cmotion-compensation.xe2x80x9d While this approach can be effective in detecting motion and is less complex than some other image understanding techniques, it is still time consuming and typically requires the use of large and expensive custom integrated circuits. In addition, it tends to be sensitive the to the quality of the incoming image. Any noise in the incoming video signal makes it very difficult to locate corresponding regions in a reference image.
Other digital techniques for motion detection in security video applications are based on the detection of edges in video imagesxe2x80x94i.e., abrupt transitions in color or brightness that delineate one region from another. Edge detection simplifies the processing of images by requiring the detection and storage of transitions only, as opposed to processing and storing values for large numbers of pixels. Edge detection takes advantage of the fact that there is a high degree of correlation between pixels in a video image (i.e., large regions of pixels tend to share similar values).
Devices that use edge detection tend to be very sensitive to false trigger events caused by changes in lighting. A stationary scene may appear to move as the lighting changes the location of shadows in a scene over the course of a day. An example of an edge detection system is disclosed in U.S. Pat. No. 4,894,716 issued to Aschwanden et al. The system disclosed by Aschwanden et al. looks for changes in the location of edges from frame to frame. This system requires a certain degree of vertical coherence to cause a triggerxe2x80x94i.e., there must be a given amount of phase shift of an edge across multiple lines for motion to be detected. The reference data that is stored comprises a set of counts indicating where edges exist in the vertical scan lines of the previous frame.
Edges are detected by low-pass filtering a scan line of the incoming video, thresholding the signal, then using the filtered and thresholded signal to trigger a one-shot. The one-shot in turn is used to gate a counter whose final value is the location of an edge in the scan line.
While this edge detection technique provides a simple method for motion-detection, it is constrained with respect to the region of the video signal over which it works, and it uses only the previous frame as a reference. As a result, sufficiently slow-moving objects are not detected. Also, this approach does not work well in an environment that does not lend itself well to edge-detectionxe2x80x94e.g., where there is insufficient contrast to find edges, or where there are sufficiently large amounts of high frequency components in the scene that create too many edges.
Another motion detection scheme using edge detection is described in U.S. Pat. No. 5,272,527 issued to Watanabe. In the system described by Watanabe, a classical signal processing technique is applied to extract edges from an input image, noise reduction techniques are applied, and an averaging mechanism is used to binary threshold the incoming image data. The previous two binary images are retained and a series of logical operations are performed on these images to create a reference against which an incoming binary image is compared. In essence, the previous two frames are used to generate a reference mask (by inverting their union), and then a population count of binary ones is applied to the masked version of the incoming image. The result is an estimate the difference between the incoming image and the previous two images. The approach of Watanabe is extremely complex and costly, and of questionable effectiveness.
The majority of video motion detection techniques work on the principle of comparing an incoming video signal to a stored reference signal. Some devices are constrained to only use the previous frame as a reference. While using only the previous frame as a reference has the benefit of requiring less storage, is less sensitive to false trigger events due to slowly changing lighting, and lends itself to a more simple implementation, it has the drawback of being unable to detect slow rate of change events.
In the digital domain, a common method for detecting motion is to subtract the value of each pixel of an incoming frame from the corresponding pixel in the reference frame, accumulate the resulting difference, and generate a motion indication when the accumulated difference signal exceeds some predetermined amount. A problem with this approach is that changes over the whole image field can cancel each other out, thereby giving a false reading. For example, a given pixel could be brighter than its corresponding reference pixel by amount N, and another pixel could be darker than its reference pixel by xe2x88x92N. In such a circumstance, the changes cancel out and significant motion may not be detected.
In addition, the simple differencing of values of corresponding pixels does not provide effective motion detection. Further, the outputs of video cameras typically have some noise imposed upon the video signal. As a result, the value reported for a pixel in an unchanging scene may vary plus and minus some amount from frame to frame simply due to noise. Most existing motion detection methods do not compensate for this noise. Consequently noise on the video signal contributes to false positive responses, requiring motion detectors to be desensitized to the point that additional false negatives are generated.
An example of a motion detection system that suffers from some of these problems is disclosed in U.S. Pat. No. 5,455,561 issued to Brown. In the system disclosed by Brown, a hybrid analog/digital approach is used in which the incoming frame is added to an inverted version of a stored reference frame. The resulting difference value is thresholded and the motion indication is generated when the thresholded value exceeds a preset amount for a given number of clock cycles. The Brown system chooses new reference frames periodically at predetermined increments of time, regardless of the current level of activity being observed by the camera. In the system of Brown, the threshold value being used is a constant (i.e., does not vary according to the input values), and the number of different pixels which are considered to constitute significant motion is also a constant.
Another approach to motion detection involves the digital decimation (by sub-sampling and low pass filtering) of video images in order to get a reduced data set. The reduced data set is compared to a similarly reduced reference image. Because decimation involves low-pass filtering of the original data, pixels of the resulting decimated image contain contributions from other pixels in the original image that were not selected during sub-sampling. Decimation can reduce the number of pixels that need to be compared while still allowing motion within the field of view to be detected. An example of a decimation approach is described in U.S. patent application Ser. No. 08/655,562 filed May 30, 1996, assigned to assignee of the present invention.
Some existing motion detection devices allow a region of interest within the field of view of the camera to be specified. For example, a border around an active image area of the video camera can be defined to be excluded from the motion detection mechanism. Likewise, entire regions within the camera""s field of view can be masked out so that motion within (or, alternatively, outside) these regions is ignored. The behavior of these masking schemes has heretofore been strictly binaryxe2x80x94a pixel is either included in the motion calculation, or not.
U.S. Pat. No. 5,339,104 issued to Hong describes a system that uses a windowing mechanism to restrict the area of interest to a rectangle within the full video frame. The system of Hong digitizes and stores a reference frame and compares it against the incoming video. A per-pixel comparison function is carried out by a table lookup in an EPROM. The (six bit) input pixel values and the reference pixel values are concatenated and presented to the EPROM as an address. The resulting output of the EPROM is the difference value. The difference value is compared with base and limit values provided by the controlling microprocessor and a binary output is generated to indicate whether the difference is within the given range of values. A count of the number of differences that fall within the given range is maintained and a motion indication is generated when the count exceeds a given value.
The present invention comprises a method and apparatus for detecting motion in video in which frames from an incoming video stream are digitized. The pixels of each incoming digitized frame are compared to the corresponding pixels of a reference frame, and differences between incoming pixels and reference pixels are determined. One or more embodiments of the invention use both a pixel difference threshold (that defines the degree (in absolute value) to which a pixel must vary from it""s corresponding reference pixel in order to be considered different) and a frame difference threshold (that defines the number of pixels which must be different for a motion detection indication to be given). If the pixel difference for a pixel exceeds the applicable pixel difference threshold, the pixel is considered to be xe2x80x9cdifferentxe2x80x9d. If the number of xe2x80x9cdifferentxe2x80x9d pixels for a frame exceeds the applicable frame difference threshold, motion is considered to have occurred, and a motion detection signal is emitted. A simple thresholding mechanism may be used. For example, in one embodiment, motion is declared to have occurred if more than N of the M pixels in the incoming frame are different from the reference frame. In one or more other embodiments, the applicable frame difference threshold is adjusted depending upon the current average motion being exhibited by the most recent frames, thereby taking into account xe2x80x9cambientxe2x80x9d motion and minimizing the effects of phase lag.
In one or more embodiments, different pixel difference thresholds may be assigned to different pixels or groups of pixels, thereby making certain regions of a camera""s field of view more or less sensitive to motion. In one or more embodiments of the invention, a new reference frame is selected when the first frame that exhibits no motion occurs after one or more frames that exhibit motion.
The present invention provides an efficient and reliable motion detection system.