Through years of digitization development, the video surveillance technology has become network-based, and is playing a positive role in monitoring the production, e.g. the production line, and the security, e.g. security in railway stations, subway stations, airports, and patient rooms. With the increase and renewal of monitored objects and property information related to such objects, it is imperative for the current interactive video system to realize convenient management, search, video man-machine interaction, and intelligent reprocessing of numerous video monitoring materials so as to bring the advantages of the network-based video monitoring system into full play.
FIG. 1 is a schematic view showing a dynamic video object descriptor (OD) that is used in the method for describing videos in the prior art. As shown in FIG. 1, a video OD is created for each object displayed in each frame, and is used for describing properties of the object such as shape, size, layer, duration, activity, activity parameters, and other features.
As described in the preceding solution, for a video sequence, each sequence number denotes a frame. A video OD is created for each project displayed in each frame, and is used for describing properties of the object such as contour coordinates, object numbering, size, layer, duration, activity, activity parameters, and other features. The video ODs form a video object description document on a frame-by-frame basis.
For example, if a dynamic object appears in n frames of images, n video ODs need to be created to denote the tracking relation; if m dynamic objects appear in n frames of images, m×n video ODs need to be created to denote the tracking relation. In each video OD, information about the object is recorded, including the contour coordinates of all pixels on the contour of the object, personal identification number (PID), size, layer, duration, activity, activity parameters, personal photos, personal parameters, and other features. To replay the video sequences, the system restores the mapping between the contour coordinates of each object and the video sequences from the video OD, thereby realizing man-machine interaction for video surveillance.
Moreover, in the technical solution of the prior art, a video OD is created for each object displayed in every frame, and every video OD is required to describe the features of the object. Therefore, if the video sequence is long or many video objects exist, the quantity and size of the video ODs used for describing video objects will increase significantly, thus slowing down the quick search of the video materials.