Conventional surveillance systems use multiple cameras to monitor a site such as a home, office, factory, etc. The systems are often archived to tape or other storage media for future reference. Such surveillance applications need continuous operation and tend to generate large amounts of video data. Storing large volumes of full-motion, full resolution video to tape needs operators, or automated tape (or other media) management systems, to swap tapes frequently as each volume becomes full.
However, using full motion full-resolution video permits the use of commodity devices such as VHS format video cassette recorders rather than more specialized, and more expensive, custom devices. In either case, such a system is not only expensive and failure prone, but also requires transmission of uncompressed video from the site to a central monitoring and archiving station. Video compression provides an improvement over uncompressed video for both transmission and storage.
Additionally, such surveillance normally needs attended monitoring of the acquired video and/or storage of that video for subsequent analysis. Inasmuch as monitoring requires labor, a single operator views the inputs from many cameras simultaneously. For example, four camera images can be displayed on a monitor (i.e., each camera image reduced in size by ½ both horizontally and vertically). While reduced spatial resolution video is often adequate for monitoring, full resolution video is preferable for storage and subsequent analysis. Reducing the video picture rate can provide adequate temporal resolution and reduce the volume of material to archive by a proportional amount. Another method of reducing the amount of material to monitor or store is to switch from camera to camera, displaying each for some period of time. Such time division multiplexing can be used with the spatial decimation to increase the number of inputs simultaneously monitored.
Applying a video encoder to each camera, however, is expensive. A standard video encoder system can process one channel of standard resolution, full motion video, or multiple channels of temporally and/or spatially reduced input. In conventional systems, decimation is done outside the encoder system in an external video multiplexer box by rotating among the inputs, selecting video fields or frames from each, as needed, or by decimation and composition in an external system. The external multiplexer typically also encodes source specific information (such as source camera number and time code) for each field of an output stream. Encoder systems such as DoMiNo™ (a registered trademark of LSI Logic Corporation, headquartered in Milpitas, Calif.) effectively incorporates multiple encoders (i.e., two independent encoders and two independent video inputs in a single integrated circuit and memory subsystem), enabling a further gain in video processing efficiency.
Video compression methods using both spatial as well as motion compensated temporal compression (i.e., MPEG, H.261, H.263, and H.264) are more efficient than spatial-only schemes such as JPEG or “DV” video cassette recorder format. Some current surveillance multiplexers encode information such as the source identifier and current time in the vertical blanking interval (i.e., non-visible pixels) or by over-writing an area of the active video region itself. Furthermore, U.S. Pat. No. 6,229,850, discloses multiple resolution video compression for multi-resolution encoding. The conventional encoding includes I-picture only encoding of a lower resolution and frame rate stream for ‘trick mode’ (i.e., fast forward or reverse) play as in Digital VHS (DVHS) tape decks. However, DVHS is a single program recording system that does not generate a digest stream. It would be desirable to multiplex inputs from several cameras into a single stream, while reducing and/or eliminating the compression gains of motion compensation.