Surveillance systems employing video cameras and audio input devices are well known. In a typical system, some or all of the video and audio signals are provided on video screens and via speakers for monitoring by security personnel. It is also known to record some or all of the analog video and audio signals on videotape for later retrieval. However, videotape suffers from serious drawbacks as a storage medium, particularly in view of the large quantity of video information generated by a surveillance system. A major concern is the sheer quantity of tapes to be stored, especially when it is desired to record signals generated by a large number of surveillance cameras. Moreover, in a large system many video tape recorders may be required, resulting in a large capital expenditure, the need to allocate space for the recorders, and the high maintenance costs of recorders due to their mechanical nature. Another problem is the need to frequently change tape cassettes. Degradation of the recording quality due to the wear on reused tapes is yet another problem.
Retrieving information of interest from recorded tapes presents additional challenges. It is the nature of video surveillance that a large part of the tape-recorded video surveillance signals is of no interest whatsoever, since it typically represents a static image of a field of view. Finding a particular sequence representing a significant event can be extremely difficult and time-consuming, requiring tedious human review of hours or days of tape-recorded signals, usually only after the event has occurred.
There have been a number of attempts to overcome these disadvantages, but so far with limited success, or at the cost of additional drawbacks. For example, it is known to multiplex and combine signals from multiple video cameras into a single image comprising multiple viewing windows within the image, each window corresponding to one of the cameras. Such multiplexing is based on decimating the frame rate of each of the video sources by a factor of N and combining N such decimated sources into a single video signal of a standard frame rate. This is called time lapsed recording. However, each camera image in the multiplexed image must undergo compression that may reduce the quality of the recorded image. Also, recording of multiplexed images does not address the problems involved in finding sequences of interest on the recorded tapes. It is also known to record the surveillance video signals selectively in response to input from a human operator who is monitoring the signals or in response to signals generated by sensor devices arranged to detect events such as opening of doors or windows. This technique reduces the total information to be recorded, while preventing storage of much uninteresting information, but at the risk of failing to record significant events which cannot readily or timely be detected by sensors or human operators. Also, the reliance on external input can result in unreliability and increased expense, particularly where human operators are to initiate recording.