Electronic surveillance images typically contain a large amount of clutter, or unusable background detail. The clutter is disruptive to the main purpose of surveillance. To reduce clutter, a conventional method calls for subtracting out the non-moving parts of the image. A biological example serves to illustrate the pros and cons of this conventional technique.
Some frogs see moving targets, e.g., flying insects, and eat them when within range. The surveillance system used by these frogs, i.e., the frog's eye and neurological system, has a retina that is so specialized that it sees only moving objects. In fact, experiments have shown that some frogs will starve if provided only motionless insects as a diet. The frog stares in a fixed manner, recognizes targets within the retina's field-of-view (FOV), and attacks when within range. Existing surveillance systems are a lot like frogs.
Conventional unattended surveillance cameras are fixed, much like the frog's stare, such that movement detected within their FOV generates an alarm signal to a remote location for response by appropriate security resources. The method by which movement is detected using a fixed camera(s) relies on the camera(s) remaining still. Even the shaking caused by wind is detrimental to their proper operation, causing false alarms that, if often generated, may lull a security force into delaying the necessary response.
The technique used with these systems correctly identifies the non-moving scene elements and then generates a reference image of them. The reference image is compared to the present live image and, theoretically, any difference in images represents a moving target or intruder when presented in a surveillance system using a fixed camera(s). A common example is the closed circuit video system used to monitor a car dealership's lot when closed. The image of the lot is “cluttered” with parked cars. An image processor subtracts out the parked cars and highlights moving objects such as intruders. To address some of the problems associated with the simple subtraction of fixed images, more elaborate algorithms have been developed, including those associated with the use of “fish-eye” lenses on cameras to provide a broader coverage with a single fixed camera.
Conventional methods used image processing of past and current (live) images and prior knowledge of the scene of interest. Changes in image resolution, geometric distortion, and processing algorithms improved the ability of these systems to output the desired result. Moving target image discriminators (MTIDs) compensated for changes in lighting, ignored slowly changing shadows, etc. Each improvement still required the camera to be locked down (or fixed) or multiple cameras to be used. If a particular requirement was for a large FOV and high-resolution images for discriminating moving targets, the solution was to provide an array of fixed cameras and processors sufficient to cover the volume to be protected. This adds expense and complexity for many applications, in particular those requiring 360° panning and 180° tilt coverage, as is the case in many military tracking applications.
One problem with these systems is that the surveillance camera(s) must still remain fixed and not be permitted to pan, tilt, or even shake in the wind. Panning and tilting causes the entire scene image to shift up, down, right, or left. Image comparisons of consecutive views from a panning or tilting camera that uses simple pixel element by element comparison techniques to provide alerts as to moving objects result in near total image noise, i.e., an unusable result. In fact, if the camera were to move even slightly due to wind or vibration, the resulting image is unusable. Accordingly, there is a need for a surveillance system that covers a large volume with the least number of resources, while also avoiding false alarms caused by wind or vibration (from such as, for example, heavy truck traffic).