Embodiments in accordance with the present invention relate to an apparatus, a method and a computer program for recognizing gestures in a picture.
Further embodiments in accordance with the invention relate to an apparatus, a method and a computer program for controlling a device on the basis of a gesture.
Some embodiments in accordance with the invention relate to a method and an apparatus for character and gesture recognition.
In many different technical applications it is desirable to control computers or other devices in a contactless manner, for example. Performing control by means of gestures has proven to be advantageous in many cases. Gestures are, e.g., symbolic movements of specific body parts, for example of the hand or of the head, for the purposes of non-verbal communication.
For example, it is possible for a person to place a hand and/or fingers into a multitude of various configurations (or arrangements).
The various configurations of a hand may be used, for example, for controlling a computer or a device. In this context, it should also be noted that gestures in many cases may even be used by handicapped persons for communication, even if there are no other possibilities available to them of expressing information. As an example, sign language of deaf and dumb people shall be mentioned here. Persons who are hindered—for whatever reasons—from using a keyboard, for example, may also pass on information to a computer by using a gesture.
Some conventional approaches and concepts regarding gesture recognition shall be described in the following.
The publication “Using deficits of convexity to recognize hand gestures from silhouettes” by E. Lawson and Z. Duric describes a method of recognizing hand gestures on the basis of hand silhouettes. The convex hull of a hand is calculated on the basis of its silhouette. Deficits regarding the convexity which describe differences between the hull and the silhouette are extracted. The deficits of the convexity are normalized in that they are rotated about the edges they share with the hull. To determine a gesture, the deficits of a plurality of examples are extracted and normalized. The deficits are grouped by similarity. Gestures are represented by strings of symbols that correspond to the nearest neighbor of the deficits. Different sequences of symbols that correspond to a given gesture are stored in a dictionary. For any given unknown gesture, its deficits of the convexity are extracted and associated with a corresponding sequence of symbols. Said sequence is compared to the dictionary of known gestures and associated with that class to which the best-matching string belongs.
The publication “Object detection by contour segment networks” by V. Ferrari and others describes a method of object detection in real pictures on the basis of a single hand-drawn example as the model. The picture edges are subdivided into contour segments and organized in an image representation that encodes its connections. The object detection problem is formulated as localization of paths on the part of the network that reproduces the outline of the model. A detection technique will be described.
The publication “Robust object recognition with cortex-like mechanisms” by T. Serre and others (published in: IEEE, Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 3, March 2007) describes a concept of recognizing complex visual scenes. The publication describes a hierarchic system that follows the organization of the visual cortex and that builds up increasingly complex and invariant feature representation by alternating between a template comparison and an operation of maximum abstraction.
The publication “Visual hand tracking using non-parametric belief propagation” by E. B. Sudderth and others (published in: MIT Laboratory for Information and Decision Systems Technical Report, P-2603, May 2004) describes probability-based methods of optically tracking a three-dimensional biometric hand model on the basis of a picture sequence. Use is made of a redundant representation wherein each model component is described by its position and orientation within a world-coordinate framework. Said document defines a model that forces the kinematic restrictions implied by the joints of the model. The redundant representation enables color-based and edge-based probability measures, such as the chamfer distance, for example, to be able to be broken down in a manner similar to cases where there is no self-overlay. On the basis of this graphic model of the kinematics of a hand, the movement of a hand is tracked by using an algorithm of non-parametric belief propagation. Non-parametric belief propagation assumes the future distribution over hand configurations as a collection of samples. Non-parametric belief propagation uses the graphic structure for reducing the dimensionality of said distributions.
The publication “Hand gesture extraction by active shape models” by N. Liu and B. C. Lovell (published in: Proceedings of the Digital Imaging Computing: Techniques and Applications, DICTA 2005) describes application of a statistic model for hand gesture extraction and recognition. Once the hand contours have been found by a system for real-time segmentation and tracking, a set of feature points is marked automatically or manually along the contour. A set of feature vectors is normalized and aligned. The set of feature vectors is then trained by using a main-component analysis. An average shape, eigenvalues and eigenvectors are calculated and form the active shape model. If model parameters are adjusted continuously, different shape contours will be created for recognizing a match with hand edges that have been extracted from original pictures. Finally, a gesture is recognized.
The publication “MAP—Inference for highly-connected graphs with DC-programming” by J. Kappes and C. Schnörr describes a draft of conclusion algorithms for discrete-valued Markov random fields. The publication mentioned describes a class of mathematical algorithms that may be applied to the class of problems mentioned, a convergence toward a critical point of the target function being guaranteed. The resulting iterative algorithms may be interpreted as simple message transmission algorithms that converge because of their design.
The article “Real-time object detection for “smart” vehicles” by D. M. Davrila and V. Philomin describes a shape-based object detection method based on distance transformations. The method uses a hierarchy of templates to detect the plurality of object shapes. Efficient hierarchies may be created for given shape distributions while using stochastic optimization techniques. A comparison includes a simultaneous coarse-to-fine approach via the shape hierarchy and via the transformation parameters.
The article “Cassandra: Audio-video sensor fusion for aggression detection” by W. Zajdel and others (published at the IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), London, 2007) describes an intelligent monitoring system called Cassandra which is directed to detect aggressive human behavior in public environments. In this context, the complementary nature of audio detection and video detection is utilized. At a low level, independent analysis of the audio stream and the video stream is performed. At a higher level, a dynamic Bayesian network is used as a combination mechanism in order to obtain an indication of aggression for a scene.
The publication “Schürmann-Polynomials—Roots and Offsprings” by U. Miletzki describes an influence of the so-called “Schürmann polynomials” on today's pattern recognition.
Further details regarding computer-based picture recognition may be found, for example, in the “Handbook of Mathematical Models in Computer Vision”.
In view of what was said above, there is a need for a concept for recognizing gestures in a picture which enables particularly reliable recognition of gestures.