1. Field of the Invention
The present invention relates to devices and methods for recognizing hand shape and position, and recording media each having a program for carrying out the methods recorded thereon, and more specifically to a device and a method for recognizing hand shape and position, without the help of an exemplary cable-connected data glove, in an applicable manner to man-machine interfaces and sign language recognition devices, for example, and to a recording medium having a program for carrying out the method recorded thereon.
2. Description of the Background Art
For a new human interface technique, currently, research and development of a device which recognizes human hand shape and grasps information conveyed thereby is actively conducted. Also, research for recognizing hand shape and position observed in sign language is also active to support communications between the hearing impaired and the able-bodied.
A general method for capturing human hand shape uses a sensor such as data glove to measure hand position and finger joint angles, and an exemplary well-known method is found in the document published by The Institute of Electrical Engineers of Japan, Instrumentation and Measurement (pp. 49 to 56, 1994) (hereinafter, referred to as first document). In the first document, the glove is provided with optical fibers along every finger, and finger joint angles are estimated by a change in light intensity.
A method for recognizing hand shape without the glove-type sensor as in the first document but with a camera is found in the document titled xe2x80x9cGesture Recognition Using Colored Glovesxe2x80x9d by Watanabe, et al., (Publication of The Electronic Information Communications Society, Vol. J80-D-2, No. 10, pp. 2713 to 2722) (hereinafter, referred to as second document). In the second document, images are captured through a multicolored glove (marker) for hand shape recognition.
An exemplary method for recognizing hand shape and position without such marker but with only a camera is disclosed in the Japanese Patent Laying-Open No. 8-263629 (96-263629) titled xe2x80x9cObject Shape/Position Detectorxe2x80x9d (hereinafter, referred to as third document). In the third document, hand shape recognition and hand position estimation are conducted through images captured by a camera placed in front of a hand. Herein, the method uses at least three cameras to photograph the hand, and the hand is taken in as a plane so as to determine to which camera the hand is facing.
Another method for recognizing hand shape from images captured by a front-facing camera is found in the document titled xe2x80x9cReal-Time Vision-Based Hand Gesture Estimation For Human-Computing Interfacesxe2x80x9d by Ishibuchi, et al., (Publication of The Electronic Information Communications Society, Vol. J79-D-2, No. 7, paragraphs 1218 to 1229) (hereinafter, referred to as fourth document). In the fourth document, from hand images captured by a plurality of cameras, a direction from wrist to middle finger (hereinafter, referred to as palm principal axis) is determined. And the position of each fingertip is also determined to count the number of extended fingers.
In recent years, to recognize object position and type of face or car, for example, an image recognition method, which is the combination of a dummy image method and an eigenspace method, has been in the spotlight. The dummy image method uses only previously-captured 2D dummy images of a 3D object to recognize the position and type thereof. The eigenspace method is the one conventionally applied, and uses an eigenspace structured by eigenvectors in a covariance matrix (or auto correlation matrix) obtained through an operation performed on a matrix being image data. In the eigenspace method, it is well-known to apply principal component analysis or KL expansion to images.
A technique for applying the principal component analysis to images is briefly described next below.
The principal component analysis is a statistical technique utilizing an eigenspace. This is popular as a technique in multivariate analysis, and is so carried out that featured points on a multidimensional space are represented on a space where the number of dimensions is reduced. This is done to make the featured points easier to see and handle. Fundamentally, featured points on a multidimensional space are linearly projected onto a less-dimensional orthogonal subspace where a distribution level is high.
In a case where the principal component analysis technique is applied to images, first, an image unit including p-piece images is expressed by
{U1, U2, U3, . . . , Up},
where U denotes a column vector obtained by subjecting images of nxc3x97m pixels to raster scanning.
Second, a component of average image c obtained from a plurality of images is deducted from the respective column vectors in the image unit. Assuming that an nmxc3x97p matrix structured by such column vectors is A, the matrix A is expressed by
A=[U1xe2x88x92c, U2xe2x88x92c, . . . , Upxe2x88x92c], 
and accordingly a covariance matrix Q is calculated by the following equation (1). Note that, a matrix AT indicates a matrix transposed from the matrix A.
Q=AATxe2x80x83xe2x80x83(1) 
Thereafter, a characteristic equation (2) is solved by using the covariance matrix Q.
xcexi=Qeixe2x80x83xe2x80x83(2) 
Herein, assuming that the number of dimensions of a to-be-structured subspace is k, the subspace can be structured by using eigenvectors which correspond to k-piece large eigenvalues
e1, e2, . . . , ek(xcex1xe2x89xa7xcex2xe2x89xa7 . . . xe2x89xa7xcexkxe2x89xa7 . . . xe2x89xa7xcexp) 
as basis.
In this manner, according to the following equation (3), by linearly projecting a certain image x onto the subspace represented by the eigenvectors, the image in the nxc3x97m dimension can be represented by a kth dimension featured vector y in a less-dimensional space.
y=[e1, e2, . . . , ek]T xxe2x80x83xe2x80x83(3) 
An exemplary method for detecting and recognizing any multifeatured entity such as human face under principal component analysis or KL expansion is found in the Japanese Patent Laying-Open No. 8-339445 (96-339445) titled xe2x80x9cDetection, Recognition and Coding of Complex Objects Using Probabilistic Eigenspace Analysisxe2x80x9d (hereinafter, referred to as fifth document). The feature of the fifth document lies in a respect that the conventionally-known principal component analysis and KL expansion are applied to a multifeatured entity such as face. The fifth document exemplarily applies such techniques to recognize hand shape, and the method in the fifth document is described next below.
First, a plurality of hand images captured through hand movement or gesture are photographed with a black background. Second, the two-dimensional contour of the hand is extracted by using Canny""s edge operator. Thereafter, the obtained edge images are subjected to the KL expansion to calculate a subspace. If an edge map in binary is used herein, however, the images may show little correlation with one another, and thus the number of dimensions k of the subspace needs to be increased to a considerable extent. By taking this into consideration, the example described in the fifth document proposes to calculate the subspace after blurring the edge images, on the edge map in binary, through distribution processing. In this manner, the number of dimensions of the subspace can be suppressed. Further, in the fifth document, the images are entirely searched on a predetermined size basis so as to find the hand location from an input image, and then recognition is carried out.
However, for hand shape recognition, wearing such data glove as in the first document may restrict hand movement due to codes connected thereto, and a user may feel uncomfortable about wearing the tight glove.
In a case where hand shape recognition is conducted by using a camera presumably together with a marker such as glove, as in the second document, the hand shape recognition cannot be achieved without the glove, and the problem of uncomfortableness is still left unsolved.
Further, in a case where hand shape and position recognition is conducted without the glove or marker but with a plurality of cameras, as in the third document, the hand is taken in as a plane so as to determine to which camera the hand is facing. In reality, however, the hand can be in a variety of shapes and some shapes cannot be judged as being closely analogous to the plane. Accordingly, the method can be applied to recognize simple shapes formed only by extending or bending fingers, for example, but not to rather complicated shapes (e.g., a circle formed by a thumb and an index finger).
Still further, in the method based on the conventional eigenspace analysis as described in the fourth document, it is not specified how to capture normalized images only of a hand. The importance for the method based on the eigenspace analysis lies in how an image region of an object is cut out before normalization. When being a simple unit, the object only needs to be subjected to normalization with respect only to size and contrast. On the other hand, when being complicated such as hand or face, the object needs to be subjected to cutting processing before normalization.
For example, when the method is applied to face recognition, popularly, eye and nose regions are first moved to a predetermined position, and then chin and hair regions are deleted. When the method is applied to hand recognition, a wrist region is first deleted in some manner, and then the hand is moved to a predetermined position for normalization. Without such processing, the method based on the eigenspace analysis may result in a low recognition rate for hand shape and position recognition.
Still further, in a case where the eigenspace analysis is applied to a human hand image as in the fifth document, it is required to extract the contour of the hand and blur an edge image. In this manner, it is impossible to distinguish between an image of one finger and an image of two fingers abutting to each other, therefore the method cannot be applied to rather complicated shapes.
Therefore, an object of the present invention is to provide a device and a method for recognizing hand shape and position even if a hand image to be provided for recognition is rather complicated in shape, and a recording medium having a program for carrying out the method recorded thereon. This is implemented by, under a method based on the eigenspace analysis, normalizing a plurality of prestored hand images varied in hand shape and position and the to-be-provided hand image after a wrist region is respectively deleted therefrom.
The present invention has the following features to attain the object above.
A first aspect of the present invention is directed to a device for recognizing hand shape and position of a hand image obtained by optical read means (hereinafter, referred to as input hand image). A device in accordance with the first aspect of the present invention comprises: a first hand image normalization part for receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; a hand shape image information storage part for storing the hand shape images together with shape information and position information about each of the hand shape images; an eigenspace calculation part for calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; an eigenvector storage part for storing the eigenvectors; a first eigenspace projection part for calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis, and storing the eigenspace projection coordinates into the hand shape image information storage part; a second hand image normalization part for receiving the input hand image, and after a wrist region is deleted therefrom, normalizing the input hand image to generate an input hand shape image being equivalent to the hand shape images; a second eigenspace projection part for calculating eigenspace projection coordinates for the input hand shape image by projecting the input hand shape image onto the eigenspace having the eigenvectors as the basis; a hand shape image selection part for comparing the eigenspace projection coordinates calculated by the second eigenspace projection part with the eigenspace projection coordinates stored in the hand shape image information storage part, and determining which of the hand shape images is closest to the input hand shape image; and a shape/position output part for obtaining, for output, the shape information and the position information on the closest hand shape image from the hand shape image information storage part.
As described above, in the first aspect, a plurality of hand images varied in hand shape and position and an input hand image for recognition are all subjected to wrist region deletion before normalization. Therefore, the hand images can be normalized with higher accuracy compared to a case where the hand images are simply subjected to normalization in size and contrast. Accordingly, under a method based on the eigenspace, the hand shape and position can be recognized with accuracy of a sufficient degree.
Further, by using the method based on the eigenspace, geometric characteristics such as the number of extended fingers can be recognized, whereby rather complicated hand shapes having little geometric characteristics can be correctly recognized.
A second aspect of the present invention is directed to a device for recognizing hand shape and position of a hand image obtained by optical read means (hereinafter, referred to as input hand image). A device in accordance with the second aspect of the present invention comprises: a first hand image normalization part for receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; a hand shape image information storage part for storing the hand shape images together with shape information and position information about each of the hand shape images; an eigenspace calculation part for calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; an eigenvector storage part for storing the eigenvectors; a first eigenspace projection part for calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis, and storing the eigenspace projection coordinates into the hand shape image information storage part; a cluster evaluation part for classifying, into clusters, the eigenspace projection coordinates under cluster evaluation, determining which of the hand shape images belongs to which cluster for storage into the hand shape image information storage part, and obtaining statistical information about each cluster; a cluster information storage part for storing each of the statistical information together with the cluster corresponding thereto; a second hand image normalization part for receiving the input hand image, and after a wrist region is deleted therefrom, normalizing the input hand image to generate an input hand shape image being equivalent to the hand shape images; a second eigenspace projection part for calculating eigenspace projection coordinates for the input hand shape image by projecting the input hand shape image onto the eigenspace having the eigenvectors as the basis; a maximum likelihood cluster judgement part for comparing the eigenspace projection coordinates calculated by the second eigenspace projection part with each of coordinates included in the statistical information stored in the cluster information storage part, and determining which cluster is the closest; an image comparison part for comparing the hand shape images included in the closest cluster with the input hand shape image, and determining which of the hand shape images is analogous most closely to the input hand shape image; and a shape/position output part for obtaining, for output, the shape information and the position information on the most analogous hand shape image from the hand shape image information storage part.
As described above, in the second aspect, the hand shape images stored in the hand shape image information storage part are classified into clusters, under cluster evaluation in the eigenspace. Thereafter, it is decided to which cluster an input hand image belongs, and then is decided which hand shape image in the cluster is the closest to the input hand image. In this manner, the frequency of comparison for matching can be reduced and the processing speed can be improved. Further, it is possible to accurately define each image by hand shape and position even if the images are analogous in hand position from a certain direction but different in hand shape.
According to a third aspect, in the second aspect, the image comparison part includes: an identical shape classification part for classifying, according to hand shape, the hand shape images included in the cluster determined by the maximum likelihood cluster judgement part into groups before comparing the hand shape images with the input hand shape image generated by the second hand image normalization part; a shape group statistic calculation part for calculating a statistic representing the groups; and a maximum likelihood shape judgement part for calculating a distance between the input hand shape image and the statistic, and outputting a hand shape included in the closest group.
As described above, in the third aspect, in a case where the hand shape images are enough to be defined only by hand shape, the hand shape can be recognized more accurate than a case where the hand shape and the hand position are both recognized.
According to a fourth aspect, in the second aspect, the cluster evaluation part obtains the hand shape images and the shape information for each cluster from the hand shape image information storage part, calculates a partial region respectively for the hand shape images for discrimination, and stores the partial regions into the cluster information storage part; and the image comparison part compares the hand shape images in the cluster determined by the maximum likelihood cluster judgement part with the input hand shape image generated by the second hand image normalization part only in the partial region corresponding to the cluster.
As described above, in the fourth aspect, a partial region is predetermined, and the comparison for matching between the hand shape images and the input hand shape image is done for the parts within the partial region. In this manner, the comparison for matching can be less frequent than the second aspect, and accordingly still higher-speed processing can be achieved with a higher degree of accuracy even if the images are analogous in hand position from a certain direction but different in hand shape.
According to a fifth aspect, in the second aspect, when the input hand image is plurally provided by photographing a hand from several directions, the second hand image normalization part generates the input hand shape image for each of the input hand images, the second eigenspace projection part calculates the eigenspace projection coordinates in the eigenspace respectively for the input hand shape images generated by the second hand image normalization part, the maximum likelihood cluster judgement part compares each of the eigenspace projection coordinates calculated by the second eigenspace projection part with the statistical information, and determines which cluster is the closest, and the image comparison part merges the closest clusters determined by the maximum likelihood cluster judgement part, and estimates hand shape and position consistent to the shape information and the position information about the hand shape images in each of the clusters.
As described above, in the fifth aspect, input hand images obtained from a plurality of cameras can be defined by hand shape and position by merging clusters, based on the closeness in distance thereamong, determined for each of the input hand images. In this manner, even a hand image which has been difficult to recognize from one direction (e.g., a hand image from the side) can be defined by hand shape and position with accuracy.
A sixth aspect of the present invention is directed to a device for recognizing a meaning of successive hand images (hereinafter, referred to as 66) obtained by optical read means. A device in accordance with the sixth aspect of the present invention comprises: a first hand image normalization part for receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; a hand shape image information storage part for storing the hand shape images together with shape information and position information about each of the hand shape images; an eigenspace calculation part for calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; an eigenvector storage part for storing the eigenvectors; a first eigenspace projection part for calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis, and storing the eigenspace projection coordinates into the hand shape image information storage part; a cluster evaluation part for classifying, into clusters, the eigenspace projection coordinates under cluster evaluation, determining which of the hand shape images belongs to which cluster for storage into the hand shape image information storage part, and obtaining statistical information about each cluster; a cluster information storage part for storing each of the statistical information together with the cluster corresponding thereto; a hand region detection part for receiving the hand movement image, and detecting a hand region respectively from the hand images structuring the hand movement image; a hand movement segmentation part for determining how the hand is moved in each of the detected hand regions, and finding any change point in hand movement according thereto; a hand image cutting part for cutting an image corresponding to the detected hand region respectively from the images including the change points; a second hand image normalization part for respectively normalizing one or more hand images (hereinafter, referred to as hand image series) cut from the hand movement image by the hand image cutting part, after a wrist region is each deleted therefrom, and generating input hand shape images being equivalent to the hand shape images; a second eigenspace projection part for calculating eigenspace projection coordinates for each of the input hand shape images by projecting the input hand shape images onto the eigenspace having the eigenvectors as the basis; a maximum likelihood cluster judgement part for comparing each of the eigenspace projection coordinates calculated by the second eigenspace projection part with the statistical information stored in the cluster information storage part, determining which cluster is the closest to each of the eigenspace projection coordinates, and outputting a symbol each specifying the clusters; a series registration part for registering, in a series identification dictionary part, the symbols (hereinafter, referred to symbol series) corresponding to the hand image series outputted by the maximum likelihood cluster judgement part together with a meaning of the hand movement image; the series identification dictionary part for storing the meaning of the hand movement image and the symbol series corresponding thereto; and an identification operation part for obtaining, for output, one of the meanings corresponding to the symbol series outputted by the maximum likelihood cluster judgement part from the series identification dictionary part.
As described above, in the sixth aspect, the meaning of the hand movement successively made to carry a meaning in gesture or sign language is previously stored together with a cluster series created from some images including the change points. Thereafter, at the time of recognizing the hand movement image, the cluster series is referred to for outputting the stored meaning. In this manner, the hand movement successively made to carry the meaning in gesture or sign language can be recognized with higher accuracy, and accordingly can be correctly caught in meaning.
According to a seventh aspect, in the sixth aspect, the device further comprises: a comprehensive movement recognition part for receiving the hand movement image, and outputting a possibility for meaning by judging how the hand is moved and where the hand is located in the hand movement image; and a restriction condition storage part for previously storing a restriction condition for restricting, according to the successive hand movement, the meaning of the provided hand movement image, wherein the identification operation part obtains, for output, while taking the restriction condition into consideration, a meaning corresponding to the symbol series outputted by the maximum likelihood cluster judgement part from the series identification dictionary part.
As described above, in the seventh aspect, the restriction conditions relevant to the comprehensive hand movement are additionally imposed, and the hand movement image is defined by meaning. In this manner, the hand movement image can be recognized with higher accuracy.
According to an eighth and a ninth aspects, in the sixth and the seventh aspects, the hand region detection part includes: a possible region cutting part for cutting a possible hand region from the hand images structuring the input hand movement image; a masking region storage part for storing a masking region used to extract only the possible hand region from an image of a rectangular region; a hand region image normalization part for superimposing the masking region on each of the possible hand regions cut from the images structuring the hand movement image, and normalizing each thereof to generate an image equivalent to the hand images used to calculate the eigenvectors; a hand region eigenspace projection part for calculating eigenspace projection coordinates for the normalized images by projecting the images onto the eigenspace having the eigenvectors as the basis; a hand region maximum likelihood cluster judgement part for comparing each of the eigenspace projection coordinates calculated by the hand region eigenspace projection part with the statistical information stored in the cluster information storage part, determining which cluster is the closest to each of the eigenspace projection coordinates, and outputting an estimate value indicating closeness between each of the symbols specifying the cluster and a cluster for reference; and a region determination part for outputting, according to the estimation values, position information on the possible hand region whose the estimation value is the highest and the cluster thereof.
As described above, in the eighth and ninth aspects, the hand region is detected by projecting the possible hand region onto the eigenspace and then selecting the appropriate cluster. In this manner, the hand region and the cluster therefor can be simultaneously determined. Accordingly, the hand region can be concurrently detected with the hand shape/position, or with the hand movement.
According to a tenth to a twelfth aspects, in the first, the second, and the sixth aspects, the first hand image normalization part and the second hand image normalization part respectively include: a color distribution storage part for previously storing a color distribution of the hand region to be extracted from the input hand image; a hand region extraction part for extracting the hand region from an input hand image according to the color distribution; a wrist region deletion part for finding which direction a wrist is oriented, and deleting a wrist region from the hand region according to the direction; a region displacement part for displacing the hand region from which the wrist region is deleted to a predetermined location on the image; a rotation angle calculation part for calculating a rotation angle in such a manner that the hand in the hand region is oriented to a predetermined direction; a region rotation part for rotating, according to the rotation angle, the hand region in such a manner that the hand therein is oriented to a direction; and a size normalization part for normalizing the rotated hand region to be in a predetermined size.
As described above, in the tenth to twelfth aspects, when normalizing the hand image, in addition to the deletion of the wrist region, the hand region is extracted based on color (beige). In this manner, the hand can be photographed with a non-artificial background, and from the image taken in thereby, the hand region can be extracted, and therefore the hand shape and position can be recognized with higher accuracy.
According to a thirteenth aspect, in the first aspect, the device further comprises: an instruction storage part for storing an instruction corresponding respectively to the shape information and the position information; and an instruction output part for receiving the shape information and the position information provided by the shape/position output part, and obtaining, for output, the instruction respectively corresponding to the shape information and the position information from the instruction storage part.
As described above, in the thirteenth aspect, the device in the first aspect can be used as an interface for other devices according to the hand shape and position.
A fourteenth aspect of the present invention is directed to a method for recognizing hand shape and position of a hand image obtained by optical read means (hereinafter, referred to as input hand image). A method in accordance with the fourteenth aspect of the present invention comprises: a first normalization step of receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; an analysis step of calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; a first projection step of calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis; a second normalization step of receiving the input hand image, and after a wrist region is deleted therefrom, normalizing the input hand image to generate an input hand shape image being equivalent to the hand shape images; a second projection step of calculating eigenspace projection coordinates for the input hand shape image by projecting the input hand shape image onto the eigenspace having the eigenvectors as the basis; a comparison step of comparing the eigenspace projection coordinates calculated for the hand shape images with the eigenspace projection coordinates calculated for the input hand shape image, and determining which of the hand shape images is closest to the input hand shape image; and a step of outputting the shape information and the position information on the closest hand shape image.
As described above, in the fourteenth aspect, a plurality of hand images varied in hand shape and position and an input hand image for recognition are all subjected to wrist region deletion before normalization. Therefore, the hand images can be normalized with higher accuracy compared to a case where the hand images are simply subjected to normalization in size and contrast. Accordingly, under a method based on the eigenspace, the hand shape and position can be recognized with accuracy of a sufficient degree.
Further, by using the method based on the eigenspace, geometric characteristics such as the number of extended fingers can be recognized, whereby rather complicated hand shapes having little geometric characteristics can be correctly recognized.
A fifteenth aspect of the present invention is directed to a method for recognizing hand shape and position of a hand image obtained by optical read means (hereinafter, referred to as input hand image). A method in accordance with the fifteenth aspect of the present invention comprises: a first normalization step of receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; an analysis step of calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; a first projection step of calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis; an evaluation step of classifying, under cluster evaluation, the eigenspace projection coordinates into clusters, determining which of the hand shape images belongs to which cluster, and obtaining statistical information about each of the clusters; a second normalization step of receiving the input hand image, and after a wrist region is deleted therefrom, normalizing the input hand image to generate an input hand shape image being equivalent to the hand shape images; a second projection step of calculating eigenspace projection coordinates for the input hand shape image by projecting the input hand shape image onto the eigenspace having the eigenvectors as the basis; a judgement step of comparing the eigenspace projection coordinates calculated for the input hand shape image with each of the statistical information, and determining the closest cluster; a comparison step of comparing each of the hand shape images included in the closest cluster with the input hand shape image, and determining which of the hand shape images is most analogous to the input hand shape image, and a step of outputting the shape information and the position information on the most analogous hand shape image.
As described above, in the fifteenth aspect, the hand shape images are classified into clusters, under cluster evaluation. Thereafter, it is decided to which cluster an input hand image belongs, and then is decided which hand shape image in the cluster is the closest to the input hand image. In this manner, the frequency of comparison for matching can be reduced and the processing speed can be improved. Further, it is possible to accurately define each image by hand shape and position even if the images are analogous in hand position from a certain direction but different in hand shape.
According to a sixteenth aspect, in the fifteenth aspect, the comparison step includes, a step of classifying, into clusters, the hand shape images included in the cluster determined in the judgement step before comparing the hand shape images with the input hand shape image generated in the second normalization step; a step of calculating a statistic representing the clusters; and a step of calculating a distance between the input hand shape image and the statistic, and outputting a hand shape included in the closest cluster.
As described above, in the sixteenth aspect, in a case where the hand shape images are enough to be defined only by hand shape, the hand shape can be recognized more accurate than a case where the hand shape and the hand position are both recognized.
According to a seventeenth aspect, in the fifteenth aspect, in the evaluation step, according to the hand shape images and the shape information, a partial region is calculated respectively for the hand shape images for discrimination, and in the comparison step, the hand shape images in the cluster determined in the judgement step are compared with the input hand shape image generated in the second normalization step only in the partial region corresponding to the cluster.
As described above, in the seventeenth aspect, a partial region is predetermined, and the comparison for matching between the hand shape images and the input hand shape image is done for the parts within the partial region. In this manner, the comparison for matching can be less frequent than the fifteenth aspect, and accordingly still higher-speed processing can be achieved with a higher degree of accuracy even if the images are analogous in hand position from a certain direction but different in hand shape.
According to an eighteenth aspect, in the fifteenth aspect, when the input hand image is plurally provided by photographing a hand from several directions, in the second normalization step, the input hand shape image is generated for each of the input hand images, in the second projection step, eigenspace projection coordinates in the eigenspace is calculated respectively for the input hand shape images generated in the second normalization step, in the judgement step, each of the eigenspace projection coordinates calculated in the second projection step is compared with the statistical information, and the closest cluster is determined, and in the comparison step, the closest clusters determined in the judgement step are merged, and hand shape and position consistent to the shape information and the position information about the hand shape images in each of the clusters is estimated.
As described above, in the eighteenth aspect, input hand images obtained from a plurality of cameras can be defined by hand shape and position by merging clusters, based on the closeness in distance thereamong, determined for each of the input hand images. In this manner, even a hand image which has been difficult to recognize from one direction (e.g., a hand image from the side) can be defined by hand shape and position with accuracy.
A nineteenth aspect of the present invention is directed to a method for recognizing a meaning of successive hand images (hereinafter, referred collectively to as hand movement image) obtained by optical read means. A method in accordance with the nineteenth aspect of the present invention comprises: a first normalization step of receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; an analysis step of calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; a first projection step of calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis; an evaluation step of classifying, into clusters, the eigenspace projection coordinates under cluster evaluation, determining which of the hand shape images belongs to which cluster, and obtaining statistical information about each cluster; a detection step of receiving the hand movement image, and detecting a hand region respectively from the images structuring the hand movement image; a segmentation step of determining how the hand is moved in each of the detected hand regions, and finding any change point in hand movement according thereto; a cutting step of cutting an image corresponding to the detected hand region respectively from the images including the change points; a second normalization step of respectively normalizing one or more hand images (hereinafter, referred to as hand image series) cut from the hand movement image, after a wrist region is each deleted therefrom, and generating input hand shape images being equivalent to the hand shape images; a second projection step of calculating eigenspace projection coordinates for each of the input hand shape images by projecting the input hand shape images onto the eigenspace having the eigenvectors as the basis; a judgement step of comparing each of the eigenspace projection coordinates calculated for the input hand shape images with the statistical information, determining which cluster is the closest, and outputting a symbol each specifying the clusters; a step of storing the symbols (hereinafter, referred to symbol series) corresponding to the judged hand image series together with a meaning of the hand movement image; and an identification step of outputting, in order to identify the hand movement image, a meaning corresponding to the judged symbol series based on the stored symbol series and meaning.
As described above, in the nineteenth aspect, the meaning of the hand movement successively made to carry a meaning in gesture or sign language is previously stored together with a cluster series created from some images including the change points. Thereafter, at the time of recognizing the hand movement image, the cluster series is referred to for outputting the stored meaning. In this manner, the hand movement successively made to carry the meaning in gesture or sign language can be recognized with higher accuracy, and accordingly can be correctly caught in meaning.
According to a twentieth aspect, in the nineteenth aspect, the method further comprises: a recognition step of receiving the hand movement image, and outputting a possibility for meaning by judging how the hand is moved and where the hand is located in the hand movement image; and a storage step of previously storing a restriction condition for restricting, according to the successive hand movement, the meaning of the provided hand movement image, wherein the identification step of outputting, while taking the restriction condition into consideration, a meaning corresponding to the judged symbol series based on the stored symbol series and meaning.
As described above, in the twentieth aspect, the restriction conditions relevant to the comprehensive hand movement are additionally imposed, and the hand movement image is defined by meaning. In this manner, the hand movement image can be recognized with higher accuracy.
According to a twenty-first and a twenty-second aspects, in the nineteenth and the twentieth aspects, the detection step includes: a cutting step of cutting a possible hand region from each hand image structuring the input hand movement image; a storage step of storing a masking region used to extract only the possible hand region from an image of a rectangular region; a normalization step of superimposing the masking region on each of the possible hand regions cut from each hand image structuring the hand movement image, and normalizing each thereof to generate an image equivalent to the hand images used to calculate the eigenvectors; a projection step of calculating eigenspace projection coordinates for the normalized images by projecting the images onto the eigenspace having the eigenvectors as the basis; a judgement step of comparing each of the eigenspace projection coordinates with the statistical information, determining which cluster is the closest, and outputting an estimate value indicating closeness between each of the symbols specifying the cluster and a cluster for reference; and a determination step of outputting, according to the estimation values, position information on the possible hand region whose the estimation value is the highest and the cluster thereof.
As described above, in the twenty-first and the twenty-second aspects, the hand region is detected by projecting the possible hand region onto the eigenspace and then selecting the appropriate cluster. In this manner, the hand region and the cluster therefor can be simultaneously determined. Accordingly, the hand region can be concurrently detected with the hand shape/position, or with the hand movement.
According to a twenty-third to a twenty-fifth aspects, in the fourteenth, the fifteenth, and the nineteenth aspects, the first normalization step and the second normalization step respectively include: a color storage step of previously storing a color distribution of the hand region to be extracted from the input hand image; a step of extracting the hand region from an input hand image according to the color distribution; a step of finding which direction a wrist is oriented, and deleting a wrist region from the hand region according to the direction; a step of displacing the hand region from which the wrist region is deleted to a predetermined location on the image; a step of calculating a rotation angle in such a manner that the hand in the hand region is oriented to a predetermined direction; a step of rotating, according to the rotation angle, the hand region in such a manner that the hand therein is oriented to a direction; and a step of normalizing the rotated hand region to be in a predetermined size.
As described above, in the twenty-third to twenty-fifth aspects, when normalizing the hand image, in addition to the deletion of the wrist region, the hand region is extracted based on color (beige). In this manner, the hand can be photographed with a non-artificial background, and from the image taken in thereby, the hand shape and position can be recognized with higher accuracy.
According to a twenty-sixth aspect, in the fourteenth aspect, the method further comprises: an instruction storage step of storing an instruction corresponding respectively to the shape information and the position information; and a step of receiving the shape information and the position information outputted in the output step, and obtaining, for output, the instruction respectively corresponding to the shape information and the position information stored in the instruction storage step.
As described above, in the twenty-sixth aspect, the method in the fourteenth aspect can be used as an interface for other devices according to the hand shape and position.
A twenty-seventh aspect of the present invention is directed to a recording medium being stored a program to be executed on a computer device for carrying out a method for recognizing hand shape and position of a hand image obtained by optical read means (hereinafter, referred to as input hand image). A program in accordance with the twenty-seventh aspect of the present invention realizes an operational environment on the computer device including: a first normalization step of receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; an analysis step of calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; a first projection step of calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis; a second normalization step of receiving the input hand image, and after a wrist region is deleted therefrom, normalizing the input hand image to generate an input hand shape image being equivalent to the hand shape images; a second projection step of calculating eigenspace projection coordinates for the input hand shape image by projecting the input hand shape image onto the eigenspace having the eigenvectors as the basis; a comparison step of comparing the eigenspace projection coordinates calculated for the hand shape images with the eigenspace projection coordinates calculated for the input hand shape image, and determining which of the hand shape images is closest to the input hand shape image; and a step of outputting the shape information and the position information on the closest hand shape image.
A twenty-eighth aspect of the present invention is directed to a recording medium being stored a program to be executed on a computer device for carrying out a method for recognizing hand shape and position of a hand image obtained by optical read means (hereinafter, referred to as input hand image. A program in accordance with the twenty-eighth aspect of the present invention realizes an operational environment on the computer device including: a first normalization step of receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; an analysis step of calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; a first projection step of calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis; an evaluation step of classifying, into clusters, the eigenspace projection coordinates under cluster evaluation, determining which of the hand shape images belongs to which cluster, and obtaining statistical information about each cluster; a second normalization step of receiving the input hand image, and after a wrist region is deleted therefrom, normalizing the input hand image to generate an input hand shape image being equivalent to the hand shape images; a second projection step of calculating eigenspace projection coordinates for the input hand shape image by projecting the input hand shape image onto the eigenspace having the eigenvectors as the basis; a judgement step of comparing the eigenspace projection coordinates calculated for the input hand shape image with each of coordinates included in the statistical information, and determining which cluster is the closest; a comparison step of comparing the hand shape images included in the closest cluster with the input hand shape image, and determining which of the hand shape images is analogous most closely to the input hand shape image; and a step of outputting the shape information and the position information on the most analogous hand shape image.
A twenty-ninth aspect of the present invention is directed to the recording medium of the twenty-eighth aspect, wherein the comparison step includes: a step of classifying, into clusters, the hand shape images included in the cluster determined in the judgement step before comparing the hand shape images with the input hand shape image generated in the second normalization step; a step of calculating a statistic representing the clusters; and a step of calculating a distance between the input hand shape image and the statistic, and outputting a hand shape included in the closest cluster.
According to a thirtieth aspect, in the twenty-eighth aspect, in the evaluation step, according to the hand shape images and the shape information, a partial region is calculated respectively for the hand shape images for discrimination, and in the comparison step, the hand shape images in the cluster determined in the judgement step are compared with the input hand shape image generated in the second normalization step only in the partial region corresponding to the cluster.
According to a thirty-first aspect, in the twenty-eighth aspect, when the input hand image is plurally provided by photographing a hand from several directions, in the second normalization step, the input hand shape image is generated for each of the input hand images, in the second projection step, eigenspace projection coordinates in the eigenspace is calculated respectively for the input hand shape images generated in the second normalization step, in the judgement step, each of the eigenspace projection coordinates calculated in the second projection step is compared with the statistical information, and the closest cluster is determined, and in the comparison step, the closest clusters determined in the judgement step are merged, and hand shape and position consistent to the shape information and the position information about the hand shape images in each of the clusters is estimated.
A thirty-second aspect of the present invention is directed to a recording medium being stored a program to be executed on a computer device for carrying out a method for recognizing a meaning of successive hand images (hereinafter, referred collectively to as hand movement image) obtained by optical read means. A program in accordance with the thirty-second aspect of the present invention realizes an operational environment on the computer device including: a first normalization step of receiving a plurality of hand images varied in hand shape and position, and after a wrist region is respectively deleted therefrom, subjecting the hand images to normalization in a predetermined manner (in hand orientation, image size, image contrast) to generate hand shape images; an analysis step of calculating an eigenvalue and an eigenvector from each of the hand shape images under analysis based on an eigenspace method; a first projection step of calculating eigenspace projection coordinates respectively for the hand shape images by projecting the hand shape images onto an eigenspace having the eigenvectors as a basis; an evaluation step of classifying, into clusters, the eigenspace projection coordinates under cluster evaluation, determining which of the hand shape images belongs to which cluster, and obtaining statistical information about each cluster; a detection step of receiving the hand movement image, and detecting a hand region respectively from the hand images structuring the hand movement image; a segmentation step of determining how the hand is moved in each of the detected hand regions, and finding any change point in hand movement according thereto; a cutting step of cutting an image corresponding to the detected hand region respectively from the images including the change points; a second normalization step of respectively normalizing one or more hand images (hereinafter, referred to as hand image series) cut from the hand movement image, after a wrist region is each deleted therefrom, and generating input hand shape images being equivalent to the hand shape images; a second projection step of calculating eigenspace projection coordinates for each of the input hand shape images by projecting the input hand shape images onto the eigenspace having the eigenvectors as the basis; a judgement step of comparing each of the eigenspace projection coordinates calculated for the input hand shape images with the statistical information, determining which cluster is the closest, and outputting a symbol each specifying the clusters; a step of storing the symbols (hereinafter, referred to symbol series) corresponding to the judged hand image series together with a meaning of the hand movement image; and an identification step of outputting, in order to identify the hand movement image, a meaning corresponding to the judged symbol series based on the stored symbol series and meaning.
According to a thirty-third aspect, in the thirty-second aspect, the method further comprises: a recognition step of receiving the hand movement image, and outputting a possibility for meaning by judging how the hand is moved and where the hand is located in the hand movement image; and a storage step of previously storing a restriction condition for restricting, according to the successive hand movement, the meaning of the provided hand movement image, wherein the identification step of outputting, while taking the restriction condition into consideration, a meaning corresponding to the judged symbol series based on the stored symbol series and meaning.
According to a thirty-fourth and a thirty-fifth aspect, in the thirty-second and the thirty-third aspects, the detection step includes: a cutting step of cutting a possible hand region from the hand images structuring the input hand movement image; a storage step of storing a masking region used to extract only the possible hand region from an image of a rectangular region; a normalization step of superimposing the masking region on each of the possible hand regions cut from each hand image structuring the hand movement image, and normalizing each thereof to generate an image equivalent to the hand images used to calculate the eigenvectors; a projection step of calculating eigenspace projection coordinates for the normalized images by projecting the images onto the eigenspace having the eigenvectors as the basis; a judgement step of comparing each of the eigenspace projection coordinates with the statistical information, determining which cluster is the closest, and outputting an estimate value indicating closeness between each of the symbols specifying the cluster and a cluster for reference; and a determination step of outputting, according to the estimation values, position information on the possible hand region whose the estimation value is the highest and the cluster thereof.
According to a thirty-sixth to a thirty-eighth aspects, in the twenty-seventh, the twenty-eighth, and the thirty-second aspects, the first normalization step and the second normalization step respectively include: a color storage step of previously storing a color distribution of the hand region to be extracted from the input hand image; a step of extracting the hand region from an input hand image according to the color distribution; a step of finding which direction a wrist is oriented, and deleting a wrist region from the hand region according to the direction; a step of displacing the hand region from which the wrist region is deleted to a predetermined location on the image; a step of calculating a rotation angle in such a manner that the hand in the hand region is oriented to a predetermined direction; a step of rotating, according to the rotation angle, the hand region in such a manner that the hand therein is oriented to a direction; and a step of normalizing the rotated hand region to be in a predetermined size.
According to a thirty-ninth aspect, in the thirtieth aspect, the recording medium further comprises: an instruction storage step of storing an instruction corresponding respectively to the shape information and the position information; and a step of receiving the shape information and the position information outputted in the output step, and obtaining, for output, the instruction respectively corresponding to the shape information and the position information stored in the instruction storage step.
As described above, in the twenty-seventh to thirty-ninth aspects, the program for carrying out the method for recognizing hand shape and position in the fourteenth to twenty-sixth aspects is recorded on the recording medium. This is to supply the method in a form of software.
These and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.