The interaction between computing devices and users continues to improve as computing platforms become more powerful and able to respond to a user in many new and different ways, so that a user is not required to type on a keyboard in order to control applications and input data. The development of a graphic user interface system, like that provided by Microsoft Corporation's WINDOWS™ operating system, has greatly improved the ease with which a user can interact with a computing device, by enabling a user to input control actions and make selections in a more natural and intuitive manner.
The ease with which a user can input control actions is particularly important in electronic games and other virtual environments, because of the need to provide input quickly and efficiently. Users typically interact with virtual environments by manipulating a mouse, joystick, wheel, game pad, track ball, or other user input device to carry out some function as defined by the software program that produces the virtual environment. The virtual environment and the effects of the user interaction with objects in the virtual environment are generally visible on a display, so that the user has immediate feedback regarding the results of the user's input and control actions.
Another form of user input employs displays that are responsive to the touch of a user's finger or a stylus. Touch responsive displays can be pressure activated, respond to electrical capacitance or changes in magnetic field intensity, employ surface acoustic waves, or respond to other conditions that indicate the location of a finger or stylus on the display. Another type of touch sensitive display includes a plurality of optical sensors that are spaced apart around the periphery of the display screen so that the location of a finger or stylus touching the screen can be detected. Using one of these touch sensitive displays, a user can more directly control a virtual object that is being displayed. For example, the user may touch the displayed virtual object with a finger to select the virtual object and then drag the selected virtual object to a new position on the touch-sensitive display.
Capacitive, electromagnetic, optical, or other types of sensors used in conventional touch-sensitive displays typically cannot simultaneously detect the location of more than one finger or object touching the display screen at a time. Capacitive or resistive, or acoustic surface wave sensing display surfaces that can detect multiple points of contact are unable to image objects on a display surface with any degree of resolution. Prior art systems of these types cannot detect patterns on an object or detailed shapes that might be used to identify each object among a plurality of different objects that are placed on a display surface.
Another approach that has been developed in the prior art uses cameras mounted to the side and above a horizontal display screen to visually capture an image of a user's finger or other objects that are touching the display screen. This multiple camera mounting configuration is clearly not a compact system that most people would want to use in a residential setting. In addition, the accuracy of this type of multi-camera system in responding to an object that is on or proximate to the display surface depends upon the capability of the software used with the system to visually recognize objects and their location in three-dimensional space. Furthermore, the view of one object by one of the cameras may be blocked by an intervening object.
To address many of the problems inherent in the types of touch-sensitive and other displays discussed above, a user interface platform was developed in the MIT Media Lab, as reported by Brygg Ullmer and Hiroshi Ishii in “The metaDESK: Models and Prototypes for Tangible User Interfaces,” Proceedings of UIST 10/1997:14-17. The metaDESK includes a near-horizontal graphical surface used to display two-dimensional geographical information. An arm-mounted, flat-panel display disposed above the graphical surface serves as an “active lens” for use in displaying three-dimensional geographical information. A computer vision system inside the desk unit (i.e., below the graphical surface) includes infrared (IR) lamps, an IR camera, a video camera, a video projector, and mirrors. The mirrors reflect the graphical image projected by the projector onto the underside of the graphical display surface. The IR camera can detect a distinctive pattern provided on the undersurface of passive objects called “phicons” that are placed on the graphical surface. Magnetic-field position sensors and electrical-contact sensors are also included in the metaDESK. For example, in response to the IR camera detecting the IR pattern (which is transparent to visible light) applied to the bottom of a “Great Dome phicon,” a map of the MIT campus is displayed on the graphical surface, with the actual location of the Great Dome in the map positioned where the Great Dome phicon is located. Moving the Great Dome phicon over the graphical surface manipulates the displayed map by rotating or translating the map in correspondence to the movement of the phicon by a user.
A similar approach to sensing objects on a display surface is disclosed in several papers published by Jun Rekimoto of Sony Computer Science Laboratory, Inc. in collaboration with others. These papers briefly describe a “HoloWall” and a “HoloTable,” both of which use IR light to detect objects that are proximate to or in contact with a display surface on which a rear-projected image is visible. The rear-projection panel, which is vertical in the HoloWall and horizontal in the HoloTable, is semi-opaque and diffusive, so that objects become more clearly visible as they approach and then contact the panel. The objects thus detected can be a user's fingers, hands, or other types of objects.
By using an interactive display that can optically detect an object on or near the display surface, it should be possible to detect movement of the object in a specific manner. Accordingly, it would be desirable for an interactive display surface to respond to specific gestures made with the user's hand that are detected by the interactive display surface. Mike Wu and Ravin Balakrishnan of the University of Toronto Department of Computer Science have pointed out the advantages of using predefined gestures for interacting with an application (“Multi-Finger and Whole Hand Gestural Interaction Techniques for Multi-User Tabletop Displays,” UIST '03) USING A Mitsubishi DiamondTouch table that employs capacitive coupling to sense hand and finger positions as the user makes a gesture. Gonzalo Ramos and Ravin Balakrishnan (also from the University of Toronto Department of Computer Science) demonstrated controlling video using gestures (“Fluid Interaction Techniques for Control and Annotation of Digital Vide,” UIST '03) on a pressure-sensitive TabletPC (or, as they noted, a higher-end workstation equipped with a digitizer tablet). In each of these prior art systems, the system must learn the functions that are activated by the gestures. However, these two prior art systems suffer with the same limitations as other touch sensitive, capacitive, or electromagnetic sensitive display surfaces—i.e., the lack of good imaging resolution, the inability to properly distinguish shape and orientation of objects, and the difficulty in sensing multiple objects in contact with the display surface at one time. Also, a pressure sensitive display surface requires actual contact with the display surface and cannot respond to objects that are in proximity with the display surface.
What is clearly needed is an interactive display system that is capable of controlling interactions with applications and selecting functions or objects using natural gestures that are intuitive in their relationship to the functions they cause to occur. It should be possible to use one or more fingers on one or more hands in making the gestures, if desired, and not require that the user's appendage(s) be in actual contact with the display surface.