This Invention relates to the acquisition and display of a sequence of images, and particularly to video cameras and recorders used to document criminal activity and other events occurring in a monitored area.
Prior art video surveillance equipment, commonly called Closed Circuit Television (CCTV), acquires and stores a surveillance record in the same format as used in broadcast television. This is an analog signal with a frame rate of 30 images per second, with each image containing 480 lines, and with a bandwidth sufficient to provide approximately 640 resolvable elements per line. As is known in the art, this is comparable to a digital video signal operating with a frame rate of 30 images per second, and with each image containing 640 by 480 pixels. For this reason and to facilitate a comparison between analog and digital video signals, the prior art CCTV format will hereafter be referred to as containing 640 by 480 pixels per image. That is, in the context of analog video signals, the term xe2x80x9cpixelxe2x80x9d is used to mean a xe2x80x9cresolvable element as limited by the signal bandwidth.xe2x80x9d In the context of digital video signals, the term xe2x80x9cpixelxe2x80x9d has its usual meaning as known in the art.
While this prior art format is well matched to the needs of broadcast television, it is inefficient for surveillance use. The goal of surveillance video is to document the events that occur in an area. To fully achieve this goal, a video surveillance system must be able to record information that allows such tasks as: (1) identifying previously unknown persons by their facial features and body marks, such as tattoos and scars; (2) identifying automobiles by reading their license plates, recognizing their make and model, and recording distinguishing marks such as body damage; and (3) monitoring the actions of person""s hands, such as the exchange of illicite drugs and money, the brandishing of weapons, and the manipulation or removal of property.
All these tasks require a spatial image resolution of approximately 80 pixels-per-foot, or greater. That is, the pixel size must be equal to, or small than, about 0.15 by 0.15 inches. Prior art systems operating with 640 by 480 pixel images can only achieve this minimally acceptable resolution when the field-of-view is set to be 8 by 6 feet, or smaller (i.e., in the horizontal direction: 640 pixels/8 ft.=80 pixels/ft.; in the vertical direction: 480 pixels/6 ft.=80 pixels/ft.). However, this maximum field-of-view for optimal operation is much smaller than typical locations that need to be monitored by surveillance video. For example, the lobby of a building might be 20 to 80 feet across, while a parking lot might be hundreds of feet in size.
A common approach to this problem is to use multiple cameras with each viewing a different small part of the monitored region. While this provides the needed resolution over the large surveillance area, it requires an excessively large number of cameras. In addition, persons and vehicles moving within the monitored region will be viewed on one camera, then another, then another, and so on. This movement from one video record to another makes it difficult and inconvenient for operators to understand the events in the region as a whole.
Another prior art approach is to pan the image acquisition. This is carried out by mechanically rotating the camera such that a relatively small field-of-view is repositioned to observe different areas within the larger region being monitored. Panning can be automatic, such as a mechanism that moves the camera back-and-forth each ten to twenty seconds, or under the manual control of an operator. However, panning has the disadvantage that only a small part of the monitored region is viewed at any one time. Events that transpire in other portions of the monitored region are completely missed. Manual panning has the additional disadvantage of being an inefficient use of costly and limited manpower.
Another common prior art approach is to simply ignore the need for adequate resolution, and set the camera to view the entire large area being monitored. However, this results in the video record having too poor of spatial resolution for identifying faces, vehicle license plates, actions of the hands, and so on. In addition, much of the recorded video is wasted because the fixed 4:3 aspect ratio (640 pixels wide by 480 pixels high) does not match the much larger aspect ratio of typical monitored areas. For example, it might be desired to monitor a 40 foot wide by 10 foot high building lobby, an area with an aspect ratio of 4:1. However, when a single camera with a fixed aspect ratio of 4:3 is adjusted for a 40 foot wide field-of-view, the height of the viewed area will be 30 feet, wasting two-thirds of the image.
Prior art CCTV systems also waste data storage by acquiring video at a frame rate of 30 images per second. This frame rate is needed in broadcast television to give the viewer the impression of smooth motion. However, this smooth motion is not needed to accomplish the key tasks of surveillance video. Recognizing persons, reading license plates, and identifying hand actions are primarily accomplished by inspecting individual frames of the video sequence. Since only small changes can occur in {fraction (1/30)}th of a second, acquiring data at 30 images per second produces redundant data that must be stored and analyzed.
Another disadvantage of the prior art is that the video information remains in analog form throughout its use, from acquisition, to storage on magnetic tape, to being displayed on a television monitor. This makes the recorded information susceptible to degradation from long-term storage, stray magnetic fields, and signal-to-noise deterioration from repeated use of the magnetic tape. In addition, analog signals cannot be compressed by removing the correlation been adjacent pixels of the same image, or pixels at the same location in sequential images. This inefficient data representation results in the need for a large storage capacity. Analog video is also limited because it cannot be transmitted over digital communication channels, such as the internet. In addition, only very simple signal processing techniques can be directly applied to analog signals, such as adjustment of the brightness and contrast. Advanced signal processing techniques, such as convolution and Fourier domain manipulation, cannot be used to improve the image quality of prior art systems because of their analog nature. The playback of analog video is likewise limited to only a few simple techniques, such as normal play, fast forward, and reverse. Advanced playback functions such as image zoom (enlargement of a smaller area) are not available, making it difficult for operators reviewing the recorded video to extract the important information.
The Invention overcomes these limitations of the prior art by acquiring video data with a large number of pixels per image, typically 5120 by 2048 or greater, and a slow frame rate, typically 15 to 240 images per minute. This video acquisition is achieved through the use of a linescan camera viewing a vertical line in the monitored area, in conjunction with a mechanical scanning assembly for repeatedly sweeping the viewed line in the horizontal direction. The resulting video data stream is compressed using MPEG or a similar algorithm, and stored in a large-capacity digital memory. Through the use of an operator interface, images contained in the recorded video can be recalled from memory, uncompressed, and displayed on a video monitor. Also by use of the operator interface, subsections of individual images can be displayed on the video monitor in an enlarged form.
It is the goal of the Invention to provide an improved method of electronic video surveillance. Another goal of the Invention is to acquire video data in a spatial and temporal format that is matched to the specific needs of video surveillance, rather than broadcast television. It is an additional goal to store and manipulate the surveillance image data in a digital form. A further goal is to provide a spatial image resolution capable of recognizing faces, automobile license plates, actions of the hands, and similar items, while simultaneously monitoring large areas. An additional goal is to provide surveillance image data with an improved signal-to-noise ratio. Another goal is to provide a high aspect-ratio surveillance image that is matched to the area being monitored. Still another goal is to provide an operator interface that facilitates the extraction of relevant subsections of the surveillance record. Yet an additional goal is to facilitate the use of digital image processing to aid in the extraction of information from the surveillance record.