For streaming video, new event detection (NED) is the task of capturing the first video clips that present previously unseen events. This task has practical applications in a number of domains such as intelligence gathering (e.g., for anti-terrorism purposes), financial market analyses, and news analyses, where useful information is typically buried in a large amount of data that grows rapidly with time. Since these applications are often time-critical and require fast turn-around, it is highly desirable to develop an online new event detection (ONED) system in practice.
About a decade ago, ONED on document streams started to gain more and more interest in the text processing community. As an extension of its text counterpart, ONED on video streams has also attracted a growing attention in the video processing community by leveraging both text and visual information. The basic idea of video ONED systems is to compare a new clip with all the clips that arrived in the past. If their similarity values based on text and visual features are all below a certain threshold, the new clip will be predicted as presenting a new event.
Previous work has shown that additional image information plays an important role in identifying the relevant video clips and achieving better topic tracking results. However, all these efforts on video ONED mainly focus on optimizing the detection accuracy instead of the detection efficiency. Actually, these methods yield a quadratic time complexity with respect to the number of clips. Thus, they are not efficient enough to detect new video events in a real-time environment, especially for large-scale video collections.
For example, in the intelligence gathering system in which tens of thousands of television channels are required to be monitored simultaneously, it is very difficult for existing ONED systems to handle such an aggregated and extremely high-bandwidth video stream in real time. Thus, while some existing NED systems are referred to as being usable online, they are really not efficient enough for real-time applications.