Sorting through a collection of raw video is time consuming since it is necessary to view a video clip in order to determine if anything of interest has been recorded. While this tedious task may be feasible in personal video collections, it is impossible when endless video, as recorded by surveillance cameras and webcams, is involved. Millions of webcams are covering the world capturing their field of view 24 hours a day. It is reported that in UK alone there are millions of surveillance cameras covering the city streets. Many webcams even transmit their video publicly over the internet for everyone to watch. Many security cameras are also available online in stores, airports and other public areas.
One of the problems in utilizing webcams is that they provide raw, unedited, data. Most surveillance video is therefore never watched or examined. In our earlier WO2007/057893 [25] we proposed a method for video synopsis for creating shortened videos by combining selected portions from multiple original images of a scene. A video clip describes visual activities along time, and compressing the time axis allows viewing a summary of such a clip in a shorter time. Fast-forward, where several frames are skipped between selected frames, is the most common tool used for video summarization. A special case of fast-forward is called “time lapse”, generating a video of very slow processes like growth of flowers, etc. Since fast-forward may lose fast activities during the dropped frames, methods for adaptive fast forward have been developed [12, 18, 4]. Such methods attempt to skip frames in periods of low interest or lower activity, and keep frames in periods of higher interest or higher activity. A similar approach extracts from the video a collection of short video sequences best representing its contents [21].
Many approaches to video summary eliminate completely the time axis, and show a synopsis of the video by selecting a few key frames [8, 24]. These key frames can be selected arbitrarily, or selected according to some importance criteria. But key frame representation loses the dynamic aspect of video. Comprehensive surveys on video abstraction appear in [11, 13].
In both approaches above, entire frames are used as the fundamental building blocks. A different methodology uses mosaic images together with some meta-data for video indexing [6, 19, 16]. In this case the static synopsis image includes objects from different times.
Object-based approaches to video synopsis were first presented in [20, 7], where moving objects are represented in the space-time domain. The concatenation of portions of images representing objects or activities across successive frames of a video are called “tubes”. As objects are represented by tubes in the space-time volume, the terms “objects” and “tubes” are used interchangeably in the following description. These papers [20, 7] introduced a new concept: creating a synopsis video that combines activities from different times (see FIG. 1).
An example of an object-based approach is disclosed in WO2007/057893 [25] assigned to the present applicant wherein a subset of frames in an input video is obtained that show movement of one or more objects. Selected portions from the subset that show non-spatially overlapping appearances of the objects in the first dynamic scene are copied from multiple input frames to a reduced number of frames in the output video sequence such that multiple locations of the objects as seen at different times in the input video are shown simultaneously in the output video.
The approaches disclosed in references [20, 7] are based on the observation that more activities can be shown in shorter video if the chronological order is not enforced. It would be useful to extend such an approach to the synopsis of endless video sequences such as obtained using surveillance cameras so as to limit the duration of the output video to a desired limit while nevertheless doing so in a controlled manner that reduces the risk of feature loss.
Efficient indexing, retrieval and browsing of long video is growing in importance, especially given the rapid increase in the number of surveillance cameras that endlessly collect video. Conventional video indexing uses manual annotation of the video with keywords, but this method is time-consuming and impractical for surveillance cameras. Additional video indexing methods have been proposed, based on selection of representative key frames or representative time intervals from the input video.
Video synopsis can be used for indexing, retrieval and browsing as many objects in a covered time period are shown in a short synopsis video. However, since many different objects are shown simultaneously, examining the simple synopsis video may be confusing.
US20060117356 (Microsoft) discloses a video browser that provides interactive browsing of unique events occurring within an overall video recording. In particular, the video browser processes the video to generate a set of video sprites representing unique events occurring within the overall period of the video. These unique events include, for example, motion events, security events, or other predefined event types, occurring within all or part of the total period covered by the video. Once the video has been processed to identify the sprites, the sprites are then arranged over a background image extracted from the video to create an interactive static video montage. The interactive video montage illustrates all events occurring within the video in a single static frame. User selection of sprites within the montage causes either playback of a portion of the video in which the selected sprites were identified, or concurrent playback of the selected sprites within a dynamic video montage.
WO0178050 (Inmotion Technologies Ltd.) discloses a system and method for using standard video footage even from a single video camera to obtain, in an automated fashion, a stroboscope sequence of a sports event, for example. The sequence may be represented as a static images of a photographic nature, or by a video sequence in which camera motion remains present, in which case the video sequence can be rendered as a panning camera movement on a stroboscope picture or as an animated stroboscope sequence in which the moving object leaves a trailing trace of copies along its path. Multiple cameras can be used for an expanded field of view or for comparison of multiple sequences, for example.
JP-2004-336172 discloses a system for shortening a surveillance video, which maintains chronological order of events, without separating between concurrently moving objects. Maintaining chronological order substantially limits the shortening possibilities. Also there is no suggestion to index objects so that the original time of an object in the synopsis video can be easily determined