1. Technical Field
The invention is related to video scene analysis, and in particular, to a system and method for providing interactive browsing of a video recording which has been pre-processed generate a montage of user-interactive video sprites overlaid on a background of a scene covered by the video recording, with each sprite corresponding to a unique event occurring within all or part of the total period covered by the video recording.
2. Related Art
Many video recordings, such as, for example security videos, typically record video of a particular area or region over long periods of time. A manual review of the video to determine whether anything of interest has occurred during the period of the recording is typically a time consuming process, especially for long video recordings. While this process can be speeded up to a degree by watching the video in faster than real-time, playing the video too fast runs the risk of the viewer missing important events in the video that might otherwise be of interest. Consequently, the task of finding relevant information in a long video using only fast forward and rewind operations is often a both a time consuming and error prone task.
In an attempt to address this issue, a number of partially or fully automated schemes based on the concept of video “key frames” have been advanced for retrieving and browsing video. In general, such schemes operate by identifying and extracting particular frames (i.e., key frames) in a video sequence which meet some predefined criteria, such as motion detection, target object detection, color detection, change detection, etc.
For example, conventional change detection methods, including pixel-based, and region-based methods, are a common way to detect events of interest in video surveillance and security applications. Typically, conventional “background subtraction” methods are used in combination with these change detection methods in algorithms for identifying video key frames. In general, such change detection methods often use a threshold value to determine whether a region of an image has changed to a sufficient degree with respect to the background. Such change detection techniques have been further improved by applying “classical” morphological filters or statistically based morphological filters to “clean up” initial pixel level change detection, making detection thresholds more robust.
Regardless of what methods are used to identify the key frames, once they have been identified, there are number of schemes that have been adapted to organize the key frames into user selectable indexes back into the original video. For example, one conventional key frame based scheme organizes the key frames into interactive “comic books.” A similar scheme organizes the key frames into “video posters.” In general, both of these schemes use different key frame layout schemes to provide the user with a number of user-selectable key frames that are indexed to the original video. In other words, the extracted key frames are typically presented as a series of individual images to the user. The user will then select a particular key frame as an entry point into the video so as to play back a portion of the video beginning at or near the time index associated with the selected key frame. Unfortunately, one problem with such schemes is that as the length of the video increases, the number of key frames also typically increases. As a result, typical key frame indices can be difficult or time consuming for a user to quickly review.
Another scheme provides mosaic representations for representing motion events detected in a video sequence. In general, this scheme generates static mosaic images from particular “scenes” within an overall video recording. These mosaic images are designed to represent motion events by either displaying a static sequence of particular moving objects against a static mosaic image of the underlying background of the video sequence, or by displaying a trajectory line or vector representing the particular path of moving objects within the overall static mosaic image.
However, one problem with the aforementioned mosaicing scheme is that it relies on “scene-cut” or “scene-change” information that is either embedded or identified within the video to segment particular scenes, with each scene then being used as the basis for creating a separate mosaic. These individual scenes are detected as “drastic changes in the frame content.” Consequently, in the case of a security video, which typically covers the same “scene” over very long periods of time, this mosaic representation scheme may tend to treat the entire video sequence as a single scene. Therefore, as the number of motion events increases, the resulting static mosaic can become a confusing patchwork of large numbers static object sequences or motion vectors overlaid on the static mosaic. Another problem with this mosaicing scheme is that moving objects are represented in the actual positions in which they occurred in the video. Consequently, where more than one moving object was in the same position, those objects may be shown as overlapping or intersecting, even where there is a large temporal difference between the occurrence of the objects or events within the video.
Still other video indexing schemes have attempted to summarize longer videos by generating a shorter video that preserves the frame rate of key elements of certain portions of the original video, while greatly accelerating portions of the video in which nothing of interest is occurring. These schemes are sometimes referred to as “video skimming” techniques. Such schemes often focus on extracting the most “important” aspects of a video into summary clips that are then concatenated to form the video summary or “skim.” However, even such video skimming techniques can result in lengthy representations of an overall video recording, especially where the length of the video increases and the number of events of interest within the video increases.
Therefore, what is needed is a system and method for both summarizing video sequences, and providing an interactive index for allowing user entry into particular points or segments of the overall video. In addition, such a system and method should allow a user to quickly review the contents of the video without the need to review individual key frames. Further, in contrast to conventional mosaicing schemes, such a system and method should avoid the display of a static sequence or trajectory line for each moving object detected within the video.