1. Field of the Invention
The invention relates to a hand gesture recognition system and method in which a stream of images, wherein a hand is represented, is received in real time and is processed to represent the hand region in each image as a vector, and the vectors are processed to recognize hand gestures.
2. Description of Related Art
Hand gesture recognition systems have been previously described. U.S. Pat. No. 4,988,981 (Zimmerman et. al) teaches how a glove based device using three sensors could be used as a computer data entry apparatus. This has been expanded upon and several papers, such as Hand Gesture Recognition System Using Multiple Camera (Utsumi et. al), Proceedings ICPR '96, Vienna; Real-Time Self-Calibrating Stereo Person Tracking Using 3-D Shape Estimation from Blob Features (Azarbayeiani et. al), Proceedings ICPR '96, Vienna; and Real-Time Hand Shape Recognition Using Pipe-line Image Processor (Ishibuchi et. at) IEEE Int. Workshop on Robot and Human Communication 1992 have shown that multiple sensors, such as cameras, can be used to find the 3D position of the object, such as a hand. These systems have the disadvantage that they require multiple sensors, which increase the costs of the system, and also mean that complex processes must be performed to integrate the information from these multiple sensors.
The Paper Visual Servoing Using Eigenspace Method and Dynamic Calculation of Interaction Matrices (Deguchi et. al), K. Deguchi & T. Naguchi, Proceedings ICPR '96 Vienna has taught how it is possible to construct 3D images using a single sensor by moving the sensor to different positions and viewing the object. However, this also has the disadvantage that complex data processing is required, as well as a mechanism to move the sensor. It also has the disadvantage that tracking fast moving objects, such as hands, would require that the sensor move at an even faster speed.
Colored gloves as a method to help segment hand shapes in real-time and in real-world conditions have been described in the papers Video-Based Hand-Shape Recognition Using Hand-Shape Structure Model In Real Time (Grobel et al.) Proceedings ICPR '96 Vienna, and Gesture Recognition Using Coloured gloves (Iwai et al.), Proceedings ICPR '96, Vienna. However, these systems use multiple colors within the same glove. The main problem with this approach is that although it makes the position of the fingers of the hand easy to detect, it makes the segmentation of the glove from a real-world environment more difficult as there is more chance of a background color matching the color of the glove. This means that a more elaborate analysis process must be used to correctly identify the fingers and hand. Also, the large number of colors is also reflected in greater processing complexity since the color analysis system must be duplicated for every color.
Use of a single color glove has been described by Human-Interface by Recognition of Human Gesture with Image Processing: Recognition of Gesture to Specify Moving Direction (Akira et al.), IEEE Int. Workshop on Robot and Human Communication 1992. However the segmentation of the gloved hand is by a process of edge detection using high contrast black colored gloves, and the system uses cameras as input devices. The problem with this approach is that in low contrast images it becomes difficult to segment the hand accurately, and if two hands are to be segmented then the processes to differentiate the left and right hand become very complicated.
Single-view based hand gesture systems have been taught by U.S. Pat. No. 5,423,554 (Davis), U.S. Pat. No. 5,454,043 (Freeman) and described by A Method of Real-Time Gesture Recognition for Interactive Systems (Watanabe et al), ICPR '96, Vienna. U.S. Pat. No. 5,423,554 (Davis) describes a color segmentation process by which a user wears colored gloves in order to interact with a virtual reality game. The description in this patent is of tracking movement of the hand, rather than hand shape or gesture recognition. It employs chromakey technology to help segment the hands from the background image.
The Watanabe et al. paper discloses a template approach to real-time gesture recognition, reducing the arm image to that of the orientation of a line. This approach uses a model of the arm to perform gesture recognition, and uses a process similar to template matching to discover what the input gesture is, compared to a set of pre-stored template models.
U.S. Pat. No. 5,291,563 (Maeda) describes a system for detecting objects such as vehicles. This system splits an image in which the objects are represented into sections, and after the sum of the pixels of the objects in each section has been calculated a principal components analysis is performed to extract the key elements. However, this system appears to suffer from a problem which makes it unsuitable for small images. It processes the image data on a pixel-by-pixel basis and thus large noise levels would arise for high granularity images on inexpensive equipment such as standard PC's, or for small images. For example, if the object being segmented moves very slightly, this can move a centroid such as (1.01,0.02) to (1,0) in either nearest integer rounding method. However, if the centroid is (0.49, 0.49) and moves to (0.51,0.51), then using a nearest integer method, the centroid will move from (0,0) to (1,1). The problem with small images is that the number of pixels which cross from one segment into another segment make up a large percentage of the total pixels within that segment. This means that for small images, less than 128.times.128 pixels, significant noise is generated as the centroid moves across these boundaries. This means that gesture recognition becomes very difficult, and the accuracy is reduced. This problem can be reduced, but not eliminated, by using larger images.
U.S. Pat. No. 5,454,043 (Freeman) describes a hand gesture recognition system on a grey scale image. The hand is segmented using a low pass filter, from which an orientation histogram is constructed, to provide a "Signature Vector" for a single image. It also teaches how these signature vectors can be used to recognise dynamic hand gestures by converting 3-D space-time-orientation maps into 2-D space-time-orientation histograms. A problem with this approach, however, is that the gesture can only be recognised when the dynamic gesture is complete. In order that this system would be able to recognise the degree of rotation for a single hand position, a signature vector for every degree of the hand angle would need to be constructed. In order to which recognise hand degree was closest, the input signature vector would need to be matched to all the stored input vectors. This is a computationally intensive task, and the accuracy of the system will degrade as the number of stored signatures is increased.