The use of a keyboard, mouse, trackpad, or touch screen to control or to interact with a display equipped device such as a computer, tablet, smart phone, etc. has given way to attempts to enable control and interaction using natural gestures in (x,y,z) space. Gestures are a form of communication made with portions of a user's body, arms, hands, fingers, etc. Such gestures are often called natural interfaces because they can be performed without any additional item, e.g., no stylus, no special gloves, no special marker need be used. Such gestures require little user practice compared to mastering a keyboard or the like, and do not require the user to look away from what might be displayed on or by the device. Further there is less repetitive motion, which can decrease likelihood of repetitive stress injury (RSI) to the user. When such gestures are made within the useable field of view of appropriate devices, the gestures can be interpreted to enable device control or user interaction with device displayed images.
In recent years there has been a migration to smaller and smaller and more portable electronic devices such as netbooks, smart phones, tablets, and such. The use of even smaller and more portable devices such as eye glasses to enable user interaction with displayed images is known in the art. Zhao describes the ornamental design for glasses with a camera in U.S. Pat. No. D645,493. Zhao's design appears to house the camera and perhaps associated signal processing system electronics in the rather bulky temple or temples of the user wearable glasses. Tang and Fadell describe a “near to eye” display and optics that are mounted on a user-worn visor, helmet, or glasses in U.S. Pat. No. 8,212,859. The '859 patent optically splits a source image to project a left image and a right image, respectively viewable by the user's left and right eyes. A system processor determines left and right image periphery colors that are emitted by respective left and right peripheral light elements. The method is said to enhance the user's viewing experience. Mooring and Fitzgerald disclose in US 2010/0156676 A1 a gesture-based user interface for a wearable portable device, although eyeglasses are not mentioned. A gesture signal is received by sensing movements of the wearable device itself, or perhaps by sensing taps on the wearable device. Clearly, if the wearable device is glasses, performing (x,y,z) gestures would require moving the user's head or tapping the glasses. Such requirements would fatigue the user and would restrict the range of potential gestures that could be made and recognized.
In another prior art approach, Gomez et al. discloses in U.S. Pat. No. 8,179,604 a user wearable marker for interaction with a computing device such as a head mounted display. The user wearable marker may be a ring, bracelet, artificial fingernail, etc. having a particular surface pattern whose reflections of IR energy are uniquely identifiable to the computing device. A pair of camera sensors spaced apart a baseline distance detect such IR reflections for signal processing. Known patterns of marker motion are said to correlate to user hand (and thus marker) movements. If sufficient levels of IR energy are absent, an active source of IR emissions may be required. In Gomez's approach, successfully detecting user-system interaction requires adequate detectable levels of IR energy that reflect off the user worn marker. As such the detected user object in Gomez is adorned in that the user must wear an external device for detection to occur, e.g., a marker such as a ring, a bracelet, an artificial fingernail whose surface-reflected IR patterns are known a priori. Unfortunately in a Gomez system, ambient IR energy can often reflect from the many surfaces of the user's hand and interfere with sensor detection of the desired marker-reflected IR energy pattern. The spaced-apart baseline distance between the sensor pair is known and Gomez uses triangulation to track marker location. The two spatial line of sight angles (β1,β2) are determined, namely the angles between the ray from the center of each camera lens to a respective normal to the camera image plane. Gomez asserts that this information enables determination of marker locations, whose (x,y,z) movements can be matched to pre-determined marker trajectories to recognize gestures, including command gestures. Possibly Gomez estimates angles β1,β2 from (x,y) pixel locations on the respective pixel sensor array plane. But absent certain geometric relationships between the two camera sensors, actually determining the (x,y,z) marker location is unsolvable. In addition to requiring the user to be adorned and wear a marker, Gomez lacks the detection accuracy and robustness need to detect three-dimensional user interactions including user pointing gestures.
Prior art systems and methods to detect and recognize gestures, including three-dimensional gestures, can vary from the highly complex and expensive to the relatively straightforward and inexpensive. However implemented, implementing such systems and methods in or on glasses requires components that are light weight, have small form factor, consume little operating power, and preferably are inexpensive. The graphic user interface (GUI) is a favored type of user-device interaction, and refers to user selection of objects presented on a display screen. For example devices such as computers, smart phones, etc. may display a so-called desktop with target items such as icons or buttons or menu options that the user can select, or a spreadsheet with cells that a user may select to enter or modify data, etc. In other applications the GUI might be a map from which the user can select and perhaps zoom in on a particular country or region or city, or the GUI might the trigger of a weapon in a video game display. As used herein, the selection means the act by the use of selecting a selectable GUI object. As used herein, the term detection refers to software code that detects and maps the user object to a predefined coordinate system, and includes identifying displayed user-selectable target items. Determining the user's intent, what the user wants to do with the selected user object, may be considered part of the selection process, or part of the next user action in the sense that user-selection of a displayed menu item may open up a list of sub-menus, or a so-called ribbon. In some applications the user can select an edge or corner of the display whereat no object perhaps is shown to reveal hidden menus that can provide additional user options. The software application controlling the GUI can determine responses to user interactions. For example if the displayed object is a geographic map, user selection of a map location can signal the software and device detection system to execute a grab function. Subsequent user created gestures will result in displayed motion of the map selected point, and map scrolling in the direction of the detected motion. Gestures can also input data to devices, e.g., computers, and thus augment other forms of natural interfaces such as device detection of user speech.
Consider now the various types of gestures that a device must recognize to provide the user with a meaningful natural user gesture experience.
A first type of gesture may be termed a static gesture, and involves detecting certain pre-determined shapes, which are mapped to at least one function on the device. For example if the eye glass device optical detection systems detects the user's open palm with stretched fingers, such gesture may be interpreted as stopping a playlist presented on the virtual or other display.
By contrast, a dynamic gesture may be determined by the motion and trajectory properties of the gesture object, typically the user's hand(s) or finger(s). For example, optical detection system detection of a user's hand being waved left-to-right may be used to scroll a list left or right, or perhaps the user performing a rolling/unrolling gesture may be interpreted to indicate a forward or backward command, perhaps during an internet web browser session.
So-called mouse/touch gestures accommodate software applications intended for use with a graphical user interface (GUI) designed for use with a traditional point and click mouse, electronic pen, or touch gestures in which the user touches the screen displaying the GUI. Understandably mouse use would be rather awkward with wearable eye glasses devices, as would be requiring the user to touch the display to interact with the GUI. But with the present invention, natural gestures may be defined with a user's fingertip, including tracking the fingertip in (x,y,z) space. The detected three-dimensional trajectory can be mapped, e.g., by a processor system on or in the glasses, to provide application updates on imagery viewed on the display screen, thus enabling driving the applications without any physical controller or a special marker. The display screen may be part of a video display system that may be part of the glasses, or may be externally generated and/or displayed.
Consider now the role of z-depth measurements. Accurate detection of how far from the device static and/or dynamic gestures occur may bring significant value to the user, being the distance at which the gesture is performed. For example a gesture made within 12″ from the device, or within an approximately 36″ arm's length from the device, can augment definitions for the gesture. The advantage could be that the user need not remember too many individual gestures, but instead just a few which can be remembered by most users most of the time, and then using distance as the way to multiply the meaning of those few gestures.
What is needed is an eye glasses mounted system and method enabling recognition of gestures made in (x,y,z) space by a user wearing the glasses within an arm's length of the glasses. Such system and method should not require the user to wear a marker or the like, and should provide detection accuracy and robustness to recognize three-dimensional user gestures that enable a user to interact using a natural user interface. Such interface may be a video image created and presented as a display by the glasses mounted system and method, or may be a video image created and displayed by some other system. In either case, the method and system should be implementable using preferably inexpensive components that are light weight with small form factor, less than perhaps 5 mm in thickness, and consume relatively little electrical operating power, preferably less than 250 mW. Further, such system and method should not require the user's head to be moved (or not moved) to detect and recognize user gestures, and to create appropriate responses thereto. Preferably such method and system provides detection granularity similar to what a conventional mouse can provide. Preferably such method and system can optionally implement multi-modal detection, for example detecting user sound(s) to help augment gesture recognition.
The present invention provides eye glasses mounted methods and systems with such features.