The present invention relates generally to man-machine interfaces, and more particularly to gesture-based input interface systems for communicating information to computers or other display-based processing systems via user hand gestures.
Speech and gestures are the most commonly used means of communication among humans. Yet, when it comes to communicating with computers, the typical home or business user is still bound to devices such as the keyboard and the mouse. While speech recognition systems are finding their way into low-cost computers, there is a real need for gesture recognition systems that provide robust, real-time operation at low cost, so as to be readily available to the typical home or business user.
Interest in vision-based gesture recognition has been growing since early 1990s. See T. Huang and V. Pavlovic, xe2x80x9cHand Gesture Modeling, Analysis and Synthesis,xe2x80x9d Proc. International Conference on Automatic Face and Gesture Recognition, pp. 73-79, June 1995, for a review of various conventional techniques.
Much of this effort has been devoted specifically to gesture-based computer interfaces, as described in, e.g., A. Azarbayejani, T. Starner, B. Horowitz, and A. Pentland, xe2x80x9cVisually Controlled Graphics,xe2x80x9d IEEE Transactions on Pattern Recognition and Machine Intelligence, 15(6):602-605, June 1993, R. Kjeldsen and J. Kender, xe2x80x9cVisual Hand Recognition for Window System Control,xe2x80x9d Proc. International Conference on Automatic Face and Gesture Recognition, pp. 184-188, June 1995, R. Kjeldsen and J. Kender, xe2x80x9cTowards the use of Gesture in Traditional User Interfaces,xe2x80x9d Proc. International Conference on Automatic Face and Gesture Recognition, pp. 151-156, October 1996, M. W. Krueger, xe2x80x9cArtificial Reality II,xe2x80x9d Addison-Wesley, 1991, C. Maggioni, xe2x80x9cGestureComputerxe2x80x94New Ways of Operating a Computer,xe2x80x9d Proc. International Conference on Automatic Face and Gesture Recognition, pp.166-171, June 1995, J. M. Rehg and T. Kanade, xe2x80x9cDigitalEyes: Vision Based Human Hand Tracking,xe2x80x9d CMU Tech Report CMU-CS-93-220, 1993, W. T. Freeman and C. D. Weissman, xe2x80x9cTelevision Control by Hand Gestures,xe2x80x9d Proc. International Conference on Automatic Face and Gesture Recognition, pp. 179-183, June 1995, A. Utsumi and J. Ohya, xe2x80x9cMultiple-Hand-Gesture Tracking Using Multiple Cameras,xe2x80x9d Proc. International Conference Computer Vision and Pattern Recognition, pp. 473-478, June 1999, M. Kohler, xe2x80x9cSystem Architecture and Techniques for Gesture Recognition in Unconstraint Environments,xe2x80x9d Proc. Int. Conf. Virtual Systems and Multimedia, 1997, H. Nishino et al., xe2x80x9cInteractive Two-Handed Gesture Interface in 3D Virtual Environments,xe2x80x9d Proc. ACM Symp. Virtual Reality Software and Technology, 1997, J. Segen, xe2x80x9cControlling Computers with Gloveless Gestures,xe2x80x9d Proceedings of Virtual Reality Systems, 1993, V. J. Vincent, xe2x80x9cDelving in the depth of the mind,xe2x80x9d Proc. Interface to Real and Virtual Worlds, 1991, D. Weimer and S. K. Ganapathy, xe2x80x9cInteraction Techniques using Hand Tracking and Speech Recognition,xe2x80x9d Multimedia Interface Design, ed. M. Blettner and R. Dannenbergc, pp.109-126, Addison-Wesley, 1992, P. Wellner, xe2x80x9cThe DigitalDesk Calculator: Tangible Manipulation on a Desktop Display,xe2x80x9d Proc. ACM Symposium on User Interface Software and Technology, November 1991.
By way of example, the above-cited C. Maggioni reference describes a system using two cameras, that detects the position of the palm of a user""s hand in three dimensions (3D). The system can recognize six static gestures, and is used as interface to a virtual environment. As another example, the above-cited R. Kjeldsen and J. Kender references describe a neural net based gesture recognition and hand tracking system that can be used in place of a mouse to move and resize computer windows.
A gesture-based input interface system is described in U.S. patent application Ser. No. 08/887,765; filed Jul. 3, 1997, now U.S. Pat. No. 6,252,298, issued Jun. 26, 2001, in the name of inventor J. Segen, which application is commonly assigned herewith and incorporated by reference herein.
A known multiple-camera gesture-based input interface system referred to as GestureVR is described in J. Segen and S. Kumar, xe2x80x9cGestureVR: Vision-Based 3D Hand Interface for Spatial Interaction,xe2x80x9d Proc. Sixth ACM International Multimedia Conference, Bristol, U.K., September 1998, which is incorporated by reference herein. This system provides a number of advantages over the other systems noted above.
Additional details regarding the GestureVR system and other gesture-based input interface systems are disclosed in U.S. patent application Ser. No. 09/208,079 filed Dec. 9, 1998, now U.S. Pat. No. 6,204,852, issued Mar. 20, 2001, in the name of inventors S. Kumar and J. Segen and entitled xe2x80x9cVideo Hand Image Three-Dimensional Computer Interface,xe2x80x9d and U.S. patent application Ser. No. 09/208,196, filed Dec. 9, 1998, now U.S. Pat. No. 6,147,678, issued Nov. 14, 2000, in the name of inventors S. Kumar and J. Segen and entitled xe2x80x9cVideo Hand Image Three-Dimensional Computer Interface With Multiple Degrees of Freedom,xe2x80x9d both commonly assigned herewith and incorporated herein by reference.
It is also known in the art to utilize shadows in computer vision image processing applications. An example of one such application is in the area of extracting buildings from aerial images, with shadows being used to generate or verify building hypotheses and to estimate building heights. Such techniques are referred to as xe2x80x9cshape from shadingxe2x80x9d techniques. See, e.g., D. G. Lowe and T. O. Binford, xe2x80x9cThe Interpretation of Geometric Structure from Image Boundaries,xe2x80x9d ARPA IUS Workshop, pp. 39-46, 1981, and C. Lin and R. Nevatia, xe2x80x9cBuilding Detection and Description from a Single Intensity Image,xe2x80x9d Computer Vision and Image Understanding, 72(2):101-121, 1998. Shadows have also been used to infer object shapes, as described in, e.g., S. A. Shafer and T. Kanade, xe2x80x9cUsing Shadows in Finding Surface Orientations,xe2x80x9d CVGIP, 22:145-176, 1983, J R. Kender and E. M. Smith, xe2x80x9cShape from Darkness: Deriving Surface Information from Dynamic Shadows,xe2x80x9d Proc. ICCV, 1987, D. Raviv, Y. Pao, and K. A. Loparo, xe2x80x9cReconstruction of Three-Dimensional Surfaces from Two Dimensional Binary Images,xe2x80x9d IEEE Trans. Rob. and Auto, 5(10):701-710, 1989, and L. Wang and J. J. Clark, xe2x80x9cShape from Active Shadow Motion,xe2x80x9d Proc. SPIE Conf. on Intelligent Robots and Computer Vision: Active Vision and 3D Methods, Boston, Mass., 1993. Compared to xe2x80x9cshape from shadingxe2x80x9d techniques, these xe2x80x9cshape from shadowxe2x80x9d techniques have an advantage in that they do not require surface reflectance maps.
Although shadow processing has been applied in the above-noted computer vision applications, it has not heretofore been applied to improving detection of gestures in a gesture-based input interface system.
In view of the foregoing, a need remains for a gesture-based input interface system that utilizes shadow processing and is capable of providing robust, real-time operation in a low-cost manner more readily accessible to typical home and business users.
The present invention provides an improved gesture-based input interface system which meets the above-identified need.
An input interface system in accordance with the invention provides gesture-based user control of an application running on a computer. Image signals generated by a camera are processed to determine if the image signals contains one of a number of designated user gestures, e.g., a point gesture, a reach gesture and a click gesture, each of the gestures being translatable to a particular control signal for controlling the application.
In accordance with the invention, if a given image signal is determined to contain a point gesture, the image signal is further processed to determine position and orientation information for a pointing finger of a hand of the user and its corresponding shadow. The position and orientation information for the pointing finger and its shadow are then utilized to generate a three-dimensional pose estimate for the pointing figure in the given gesture. The generation of a three-dimensional pose estimate for the point gesture can be used to allow user manipulation of objects in three-dimensions within the application running on the computer.
For example, the position and orientation for the pointing finger may comprise a pair of two-dimensional poses, one representing an extracted image signal peak corresponding to the pointing finger and the other representing an extracted image signal peak corresponding to the shadow of the pointing finger. More particularly, the pair of two-dimensional poses may be of the form {(x,y,xcex8), (xs,ys,xcex8s)}, where (x,y) and xcex8 denote the position of the tip and orientation, respectively, of the pointing finger in two-dimensional space, and (xs,ys) and xcex8s denote the position of the tip and orientation, respectively, of the shadow of the pointing finger in two-dimensional space. The three-dimensional pose estimate generated from the pair of two-dimensional poses may be in the form of a set of five parameters (X, Y, Z, xcex1, xcex5), where (X, Y, Z) denotes the position of the tip of the pointing finger in three-dimensional space, and (xcex1, xcex8) denotes the respective azimuth and elevation angles of an axis of the pointing finger.
Advantageously, the gesture-based input interface system of the present invention can be used as an input interface to many different types of multi-dimensional computer or other processing device-based applications, such as virtual flight simulators, graphical editors and video games. The system provides robust, real-time operation in a substantially user-independent manner. Moreover, the system can be implemented using an inexpensive off-the-shelf camera or other image capture device and requires minimal computational resources. The gesture-based system of the present invention thus offers an efficient, low-cost solution that is readily accessible to the typical home or business user.