An obvious search method is based on consecutive viewing of the accumulated entire video data archive at a constant speed, that is, viewing of the entire sequence of frames received from a fixed video camera. This method is disadvantageous because much time is needed to complete the search. On average, finding a needed event takes a time equal to half the time spent to view the accumulated archive.
Known is search method that comprises viewing the entire sequence of frames received from a fixed video camera and also viewing different fragments of the sequence at different speeds (see: U.S. Pat. No. 7,362,949, 2001, U.S. Class 386/68, “Intelligent Video System”).
This method comprises the following actions for each fragment of a sequence of frames:
                calculating the factor of interest of the fragment to the operator;        calculating, on the basis of this factor, the speed at which this fragment is shown to the operator on the display; and        showing the fragment to the operator on the display at the speed calculated for this fragment.        
For example, the following characteristics may be used as factors of interest at the operator's discretion:                presence of any movement or change in the frame;        presence of objects of interest to the operator in the frame; and        presence of texts (in particular, news reports) in the frame.        
Even though this method speeds up the search as compared to the obvious method for viewing the entire accumulated archive of video data, it still requires much time. A disadvantage of this method is also a inconvenience in visual perception of fragments of a sequence of frames shown at a variable speed. It will be noted that this method does not use a combination of objects images captured at different moments in time in one frame.
The method of U.S. Pat. No. 7,362,949 is used for searching video data in an archive that stores the entire sequence of frames received from a video camera at all moments in surveillance time. But known are methods for recording video data received from a video camera those only record fragments of a sequence of frames considered essential to the video data archive (see: U.S. Pat. No. 5,455,561, 1994, U.S. Class 340/541, “Automatic Security Monitor Reporter;” U.S. Pat. No. 5,825,413, 1995, U.S. Class 348/155, “Infrared Surveillance System with Controlled Video Recording;” and U.S. Pat. No. 7,864,980, 2003, U.S. Class 382/103, “Video Motion Anomaly Detector”). These recording methods help reduce the volume of the video data archive that allows one to reduce the time of searching for events of interest to the operator. In the method of U.S. Pat. No. 5,455,561, 1994, video data are recorded in the archive in “alarm” situations only, for example, the presence of intruders or fire. In the method of U.S. Pat. No. 5,825,413, video data are recorded in the archive only when the infrared sensor registers motion and indicates it. In the method of U.S. Pat. No. 7,864,980, video data are only recorded in the archive when the movement trajectories of generic points, or “point features,” fall outside the pattern of normal behavior that is formed automatically on the basis of trajectories that were observed previously.
A disadvantage of the search using these recording methods is that they do not reduce sufficiently the time needed to view the accumulated video data archive and also lose information that could be received from a video camera but that cannot be used because recordings in the video data archive are not made of all the fragments of the sequence of frames.
For the purpose of reducing the video archive viewing time, synthetic frames, combining one or more objects pictured in different source frames, are created and used. Known are methods for forming a sequence of synthetic frames of images from a sequence of source images received from a video camera (see: U.S. patent application published under number US 2009/0219300 A1, 2006, U.S. Class 345/630, “Method and System for Producing a Video Synopsis,” and U.S. patent application published under number US 2010/0125581 A1, 2009, U.S. Class 707/737, “Method and Systems for Producing a Video Synopsis Using Clustering”).
The methods of these applications comprise:                computing the static background;        detecting moving objects;        compiling a schedule for displaying each of the detected moving objects; and        displaying several objects simultaneously to the operator against the computed static background, where the images of said objects were captured at different moments in time and, thereby, would have been shown at different moments in time had the source sequence of frames been viewed.        
Application US 2009/0219300 A1 discloses two variants of implementation of the method for forming a sequence of synthetic frames. In the first variant, all computations are made at the synthetic frame construction stage, that is, off-line. In the second variant, moving objects and their movement trajectories are first detected on-line, and then the static background is computed off-line and other actions are performed.
In the method of Application US 2010/0125581 A1, for an object displaying schedule to be produced, the objects are combined according to the “similarity” of their external appearance and similarity of their movement trajectories (according to geometric proximity and speeds of movement).
A disadvantage of the method of Application US 2009/0219300 A1 is the large volume of computations needed for constructing the background, which takes much time, or the large memory capacity needed when the “running median” method is used for static background construction. Another disadvantage of the method of the above-referenced application is that the background is computed incorrectly when some parts of the background are occluded by the objects in more than 50% of the frames. Yet another disadvantage of this method is that a large volume of computations is required for compiling a full object displaying schedule (by contrast, the proposed method does not require a full object displaying schedule to be produced).
A disadvantage of the method of Application US 2010/0125581 A1 is the waste of time to join objects according to their similarity and similarity of their trajectories (by contrast, the proposed method does not require objects to be combined on this principle).
The prototype of the proposed method is the method for searching for objects in a sequence of images received from a fixed video camera disclosed in the U.S. patent application published under number US 2010/0092037 A1, 2007, U.S. Class 382/103, IPC8 G06K 9/00, “Method and System for Video Indexing and Video Synopsis,” by inventors S. Peleg et al.
Before we proceed to discuss the essence of the prototype of the proposed method, we will examine the concept used in the application on the prototype but called unsuccessfully that impedes comparison of the prototype with the proposed method. The prototype uses constructing, in respect of each object, a sequence of its images recorded at different points in time. In this application, each such sequence called “tube” or “activity” (p. 4, [0091]), that appears to—be a poor choice. In application US 2009/0219300 A1 (with S. Peleg among the inventors), this sequence is called “characteristic function” (p. 4, [0080]), a poor choice again because this is a broad concept. In Application US 2010/0125581 A1 (also with S. Peleg among the inventors), this sequence is also called “activity” (p. 2, [0037]), that is a poor choice, as we write above.
Each moment in time of frame registration in this sequence is put in correspondence with a subset of frame pixels presenting an image of the object in the frame and characterizing its position in the scene observed. From the mathematical viewpoint, this sequence is a graph mapping a set of moments in time of frame capturing in a set of all possible subsets of pixels in the frame. It would be more correct, therefore, to call this sequence “an object movement graph,” instead of “tube,” “activity” or “characteristic function”. It is necessary, though, for convenience to order the elements of a set of frame registration moments in time in this graph in the ascending order of their values. This sequence could also be called “a spatiotemporal object movement map”.
A more suitable and convenient term for this sequence appears to be an object movement trajectory, in which each of its points is put in correspondence with a pair consisting of a moment in frame capturing time and its corresponding subset of pixels in the frame. The moments in frame capturing time in this trajectory concept are assumed to be ordered in the ascending order.
This trajectory concept is similar to the concept of spatiotemporal trajectory in which location of an object is defined as a set of points making up a vector (see: article by Y. Cai, R. Ng., “Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials”, Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data SIGMOD 04, pp. 599-610, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.5274; definition of the term “trajectory” appears in the printed version of this article, p. 604, and in the Internet edition of this article, at p. 6).
This trajectory concept is identical to the concept of spatiotemporal trajectory in which the object movement trajectory is defined as a sequence of pairs <object position and point in time>, object position being, in turn, defined as a set of points making up a vector (see: article by P. Bakalov, M. Hadjieleftheriou, V. J. Tsotras, “Time Relaxed Spatiotemporal Trajectory Joins,” Proceedings of the 13th Annual ACM International Workshop on Geographic Information Systems, 2005, pp. 182-191, http://www.2.research.att.com/˜marioh/papers/acmgis05.pdf; definition of the “trajectory” concept is given at p. 184 of the printed edition of this article, and at p. 2 of the Internet edition of this article). It is stated at p. 182 of this article that this definition of the trajectory concept is used in various fields, including video surveillance systems.
In the following consideration of the prototype method, and the proposed method later on, we use the term “object movement trajectory” for designating a sequence of object positions.
The prototype method for searching for objects in a sequence of images received from a fixed video camera comprises:                detecting objects of interest to the operator in a source sequence of frames received from a fixed video camera; each frame representing an image of a scene and having a timestamp specifying the moment in time said frame was captured;        constructing a movement trajectory for each of the objects detected in which every point of said trajectory is put in correspondence with the position of the object in the frame and the moment in time when the frame was captured, the position of the object in the frame being represented by a set of frame pixels representing an image of the object;        forming a queue of movement trajectories of the objects detected;        compiling a schedule for displaying the detected objects in which the point in time for starting object display on the screen is given for the trajectory of each object;        constructing a plan for forming synthetic frames in accordance with the schedule such that several objects can be shown in the frames simultaneously in positions captured generally at different moments in time;        forming a successive synthetic frame in accordance with the plan by including in such synthetic frame the images of objects that must, in accordance with the plan, be shown in the synthetic frame simultaneously, and the background against which they are to be shown; and        displaying on the screen the synthetic frames formed as above to the operator.        
In the prototype method, the object displaying schedule is compiled to display at once all the objects the images of which are available in the video data archive simultaneously as is needed to produce an optimal schedule. Compiling an optimal schedule takes a lot of time.
Furthermore, the prototype method does not assure maintenance of the time-related order of location of objects in the sequence of synthetic frames, that is, an object whose image was recorded after the image of the object following it may be placed at first in the sequence.
To compute the image of the background for a scene observed, the prototype method uses all the frames of the source sequence of frames, or frames in the vicinity of a single current frame for which the background is computed. A large memory capacity and a large volume of computations are required for these purposes.
A disadvantage of the prototype method is, therefore, that it requires a significant length of time to perform a large number of computations and a large memory capacity, which increases significantly the time period between the operator's request for an object search to be performed and the time when he is shown the first synthetic frame.
A significant disadvantage of the prototype method is that the method does not assure display of all the objects detected, that is, some of the objects detected may not be shown because of the specifics of the optimization procedure used under Application US 2010/0092037 A1 (p. 6, [0111]).
Another disadvantage of the prototype method is that the time-related order in which the objects are shown may not be maintained, that is, an object appearing in the field of view of the video surveillance system later than the object appearing in the field of view of the system at an earlier time may be shown first. This causes inconvenience to the operator analyzing the situation in the scene observed. A further disadvantage of the pertinent art method is that the background is computed incorrectly in instances when some of the background points are occluded by objects in more than 50% of the frames.