The monitoring of information from large-scale video streams presents several challenging problems. For example, an ideal monitoring system provides the ability to process a large amount of data, while also providing an understanding of the semantic content of the data in real-time. After processing and understanding the data, the monitoring system uses these results to filter the information.
Examples of situations in which it is desirable to provide an information monitoring system that achieves such goals may include, for example, the monitoring of foreign military or political activities through hundreds of live broadcasting video channels; the monitoring of activities and context of hundreds of video cameras mounted on cars or soldiers; and the monitoring of Internet traffic to determine whether movies are being illegally distributed. The semantic content that is required to be understood in these examples may include, for example, the mentioning of political leader activities on foreign broadcasting news, the type of scene a soldier is viewing, and the type of video being played through an Internet source.
Traditional indexing and semantic content detection techniques developed for databases are not easily extendible to the dynamic nature of video streams. However, recently, real-time stream information classification has received greater attention on other modalities, such as, for example, email activities, chat room monitoring, and voice over Internet protocol (VoIP) monitoring, due to the inherent challenges regarding classification and information routing speed.
Traditional approaches to large-scale video stream monitoring have relied on storage-and-process techniques, which have associated limitations. For example, once the data amount, CPU power or CPU memory reaches a certain threshold, these systems may break down entirely. Therefore, it is desirable to have an improved system filter transmission video packets based on the semantic content at a faster speed under various resource constraints.
A semantic routing tree has been used to route signal-level information on a resource-constrained sensor network, see, for example, S. Madden et al., “The Design of an Acquisitional Query Processor for Sensor Networks,” SIGMOD, San Diego, Calif., June 2003. Routing is based on the signal properties and predefined decision trees. However, multimedia streaming data has content that is more difficult to detect and filter. Even in the raw video data domain without any resource constraint, video semantics detection remains an open issue, see, for example, A. Amir, et al., “IBM Research TRECVID-2003 Video Retrieval System,” NIST TREC-2003, November 2003.
In large-scale video streams scenarios, the targeted video content may stream at speeds in a range of approximately tens of gigabytes of multimedia per second. An ideal system is able to conduct semantic content detection of the video streams in real-time. Unfortunately, existing systems are unable to provide the streaming video bandwidth necessary for routing multimedia data to the classifiers, and are also unable to achieve real-time semantic content detection.
Therefore, a novel semantic filtering system that can be applied to large-scale content monitoring is desired that reduces the amount of transmission loads through filtering of video content packets based on semantic detection.