Manual searching through video data can be very time consuming. In its simplest form, searching involves viewing video segments and jumping from one segment to the next. It has been proposed to speed up this process by means of automatic detection of image of interest in the video data. In edited video data such as motion pictures, detection of scene or shot changes can be used to select images of interest. Similarly, in continuous video detection of events like motion may be used to select images of interest. In video searching, images of interest can be used to identify key frames that can be used for faster selection of video segments. Similarly, in video surveillance point of interest detection can be used for generating alert signals for human supervisors.
Better focussed access to video data can be realized by means of pattern recognition techniques. For example, it is known to recognize objects, such as persons, in video images and to retrieve video segments based on a query that identifies an object that should be detected in the video segments. Another pattern recognition technique comprises automated detection and classification of actions that are visible in video data. Use of temporal aspects in the video sequence enables detection of distinctions between different types of action that cannot easily be distinguished on the basis of an individual image. Examples of detectable action types include simple motion, but also more specific actions throwing, catching, kicking, digging etc. Action detection may enable users to retrieve specific segments of video data by means of a database-like query that specifies an action type as search criterion.
Automated detection of specific actions such as throwing, catching, kicking, digging etc from video sequences is a complex pattern recognition task, which is not easy to program. Known action detection methods make use of a decision function that depend on detected local, low level time dependent video features derived from the video data. By setting parameters of the decision function to different values, decision functions are obtained that distinguish different types of action. For example, when support vector machine recognition is used, the parameters may include weight factors and a threshold used in a decision function, and components of a plurality of support vectors. Each support vector may contain a set of relative frequencies in a reference histogram of different feature values in a video sequence. Automated training is used for the selection of the parameter values, on the basis of detected features derived from training sequences, each in combination with a manually added identification of the type of action that is shown in the training sequence.
It has been found that such an automated training procedure can provide for reasonably reliable determination of actions type of actions that have been captured in video sequences. However, the results are limited to determination of the type of action. Training the system to determine additional features of the action would significantly increase the complexity of the training process and the number of exemplary video sequences, making training virtually unfeasible.