Video surveillance cameras have become very popular owing to the low cost of video cameras and disk storage used to record the video and the availability of networked cameras allowing a simple video transfer over the network. Costs have become so affordable that people even install surveillance cameras in private homes. The video generated from most surveillance cameras is recorded in huge video archives.
Most installed video cameras record the video in DVRs (Digital Video Recorders) or NVRs (Network Video Recorders). Normally, no one views the recorded video. Finding activities in video archives presents a significant problem. Automated video analysis approaches for finding activities of interest are making continuous progress, but are still far from giving satisfying solutions. Summarization methods enable more efficient human browsing in video [8, 11], but create summaries that are either too long or are confusing.
Video analytics systems, which aim at understanding surveillance video, are useful in providing simple alerts. Automatic methods to detect entrance into areas that should be off limits, or to detect crossing from one image region to another image region, provide accurate alerts with almost no errors. But many cases are still too difficult even for the best video analytics systems, while a human observer could have made a fast and accurate decision. Despite much research on the detection of suspicious behavior, for example, human performance is still much better than automatic decisions.
Many different approaches have been proposed for video summarization. Most methods generate a static description, usually as a set of keyframes. Other methods use adaptive fast forward [7, 1], skipping irrelevant periods.
WO 07/057893 (Rav-Acha et al.) discloses a method for creating a short video synopsis of a source video wherein a subset of video frames is obtained in a source sequence that show movement of at least one object that is a connected subset of pixels from at least three different frames of the source video. At least three source objects are selected from the source sequence, and one or more synopsis objects are temporally sampled from each selected source object. For each synopsis object a respective display time is determined for starting its display in the synopsis video, and the video synopsis is generated by displaying selected synopsis objects each at its respective predetermined display time without changing the spatial location of the objects in the imaged scene such that at least three pixels, each derived from different respective times in the source sequence, are displayed simultaneously in the synopsis video.
WO 08/004222 describes an extension to this approach that is adapted for the generation of a video synopsis from a substantially endless source video stream as generated by a video surveillance camera. Object-based descriptions of at least three different source objects in the source video stream are received in real time, each source object being a connected subset of image points from at least three different frames of the source video stream. A queue of received object-based descriptions is continuously maintained and includes for each respective source object its duration and location. A subset of at least three source objects is selected from the queue based on given criteria, and one or more synopsis objects are temporally sampled from each selected source. For each synopsis object a respective display time for starting its display in the video synopsis is determined, and the video synopsis is generated by displaying selected synopsis objects or objects derived therefrom each at its respective predetermined display time, such that at least three points, each derived from different respective times in the source video stream, are displayed simultaneously in the synopsis video and at least two points, both derived from the same time, are displayed at different times in the video synopsis.
WO 08/004222 also discloses indexing the video synopsis by clustering objects into clusters of similar objects. This facilitates browsing of the video synopsis and may be done using any clustering method, for example by building an affinity (similarity) matrix based on some similarity measure between every pair of objects.