The present invention relates generally to surveillance systems and, more particularly, to a surveillance system for capturing and storing information concerning security events, and responding to those events using a network.
Video Compression
The Moving Pictures Expert Group (MPEG) is a family of standards used for the quality and efficient coding of video and audio information in digital compressed format. Several MPEG standards exist, such as MPEG-1 for coding of still images, MPEG-2 for coding moving pictures (video), MPEG-4 for coding multimedia.
Content Description
The most recent standardization effort taken on by the MPEG committee is that of MPEG-7, formally called xe2x80x9cMultimedia Content Description Interface.xe2x80x9d This standard plans to incorporate a set of descriptors and description schemes (DS) that can be used to describe various types of multimedia content. The descriptor and description schemes allow for fast and efficient searching of content that is of interest to a particular user.
It is important to note that the MPEG-7 standard is not meant to replace previous coding standards, rather, it builds on previous standard representations. Also, the standard is independent of the format in which the content is stored.
The primary application of MPEG-7 is expected to be for use in search and retrieval applications. In a simple application environment, a user specifies some attributes of a particular object. At this low-level of representation, these attributes may include descriptors that describe the texture, motion and shape of the particular object. To obtain a higher-level of representation, one may consider more elaborate description schemes that combine several low-level descriptors.
Video Receiver
The prior receiver 100 is shown in FIG. 1. The receiving and decoding takes place in two basic stages. During a first stage, features are extracted from the compressed video, and during a second stage, the extracted features are used to reconstruct the video.
During the first stage of decoding, a demultiplexer (demux) 110 accepts a compressed bitstream 101. The demux synchronizes to packets of the received bitstream, and separates the video, audio and data portions of the bitstream into primary bitstreams 102. The still compressed primary bitstreams are sent to a shared memory unit 120 using a memory controller 130. A front-end parser 140 parses the compressed bitstreams. The parser 140 is responsible for extracting the higher level syntax of the bitstreams, e.g., above the slice-layer in the MPEG-2 standard.
Below this level, bits are transferred to a symbol processor 150, which is mainly responsible for variable-length decoding (VLD) operations. In the MPEG bitstream, for example, the motion vectors and discrete cosine transform (DCT) coefficients are encoded by variable-length codes, along with other information such as macroblock modes, etc.
During the second stage of decoding, additional blocks are turned on to reconstruct the video signal. From the symbol processor, extracted macroblock and motion vector information are sent to an address generator 160, and DCT information is sent to an inverse DCT 170.
The address generator 160 is responsible for generating the memory addresses where the video data are to be written and read in the memory unit 120. The address generator depends heavily on such information such as a prediction mode, location of current block, and motion vector value. Some of this information is passed on to a motion compensation unit 180, which combines data read from the memory unit with data received from the IDCT 170.
In the case of intra mode prediction, there may be no data read from memory because data read from memory are predictive information. Reconstructed data are written to the memory 120 from the motion compensation unit 180. When it is time for this data to be displayed, a display processor 190 reads the data for any additional processing that may be needed. A user interface 195 interacts with the memory controller 130 so that the limited, positional access can be realized.
Networking
Computing technology is now inexpensive enough that it is possible to network many intelligent electronic devices throughout homes and enterprises. It is now is also possible to move digital data, in the form of audio, images, and video between devices, to share the data information with other users using the World Wide Web.
Universal Plug and Play (UPNP) is one initiative to provide easy-to-use, flexible, standards-based connectivity to networked devices. UPNP is an architecture for networking PC""s, digital appliances, and wireless devices. UPNP uses TCP/IP and the Web or some other Simple Control Protocol (SCP) to control and transfer data between networked devices in the home, enterprises and everywhere else a Web connection can be made.
UPNP is intended to work in a network without special configuration. A device can dynamically join the network, obtain an Internet Protocol (IP) address, announce itself and its capabilities upon request, and learn about the presence and capabilities of other devices in the network. In addition to joining the network, the device can leave the network without leaving any undesired state behind.
Security Systems
Most prior art surveillance systems use closed-circuit television (CCTV) to acquire a video of indoor and outdoor scenes. Security systems typically display the video on monitors for simultaneous viewing by security personnel and/or record the video in a time-lapse mode for later playback.
Serious limitations exist in these approaches. Humans are limited in the amount of visual information they can process in tasks like video monitoring. After a time, significant security events can easily go unnoticed. Monitoring effectiveness is additionally reduced when multiple videos must be monitored. Recorded video for later analysis does not provide for real-time intervention. In addition, video recordings have limited capacity, and are subject to failure.
Typically, the video is unstructured and unindexed. Without an efficient means to locate significant security events, it is not cost-effective for security personnel to monitor or record the output from all available cameras. Video motion detection can be used to crudely detect security events. For example, any motion in a secured area can be considered a significant event. However, in complex scenes, most simple motion detection schemes are inadequate.
U.S. Pat. No. 5,594,842 describes a surveillance system that uses clustered motion vectors to detect events. U.S. Pat. No. 6,031,582 describes a surveillance system that uses signal-strength difference corresponding to motion vectors to detect events.
U.S. Pat. No. 6,064, 303 describes a PC-based home security system that monitors the surrounding environment to detect suspicious or uncharacteristic events. When a threshold event is detected, the system conducts close surveillance for an additional events. When the accumulated detected events exceed some threshold value, the security system takes an appropriate remedial action. The system detects sound and video events by pattern recognition. Sound events use prerecorded files processed by a fast Fourier transform to provide amplitudes at various discrete characteristic frequencies as a function of time, and detected video events are movement (size and duration), light contrast change, and dark to light change. The events have associated severities. Responses are telephone to appropriate numbers with prerecorded messages.
U.S. Pat. No. 5,666,157describes an abnormality detection and surveillance system that has a video camera for translating real images of a zone into electronic video signals at a first level of resolution. The system includes means for sampling movements of individuals located within the zone. The video signals of sampled movements is electronically compared with known characteristics of movements which are indicative of individuals having a criminal intent. The level of criminal intent of the individuals is then determined and an appropriate alarm signal is produced.
The MPEG-7 document ISO/IEC JTC1/SC29/WG11/N2861, xe2x80x9cMPEG-7 Applications Document v.9,xe2x80x9d July 1999, describes a surveillance application, in which a camera monitors sensitive areas and where the system must trigger an action if some event occurs. The system may build its database from no information or limited information, and accumulate a video database and meta-data as time elapses. Meta-content extraction (at an xe2x80x9cencoderxe2x80x9d site) and meta-data exploitation (at a xe2x80x9cdecoderxe2x80x9d site) should exploit the same database.
However, many security applications require real-time event analysis. The MPEG-7 Application Document does not provide information on how to achieve real-time performance. Furthermore, the actual meta-data that are to be extracted to achieve fast, robust and accurate event detection are not specified. Finally, this document does not say anything about the operation of the extraction unit and other networked devices.
It is desired to provide an improved surveillance system that uses video coding and networking technologies as described above.
The invention provides a surveillance and control system that includes a feature extraction unit to dynamically extract low-level features from a compressed digital video signal, and a description encoder, coupled to the feature extraction unit, to encode the low-level features as content descriptors. The system also includes an event detector coupled to the description encoder to detect security events from the content descriptors, and a control signal processor, coupled to the event detector, to generate control signals in response to detecting the security events.
The system can also include a telephone, a personal computer, and a video recorder coupled to each other by a network. The network includes a low-bandwidth network is for carrying the control signals and a high-bandwidth network for carrying the compressed digital video signal and the content descriptors. A memory unit stores the compressed digital video signal, the content descriptors, the control signals, user input, and configured user preferences.
The surveillance and control system further includes a symbol processor, coupled to the feature extraction unit, to extract motion vectors and macroblocks and DCT coefficients from the compressed digital video signal, and a bitstream processor, connected to the memory unit, to produce an output compressed digital video signal including the compressed digital video signal and the content descriptors.