Manufacturers of computing devices as well as developers of operating systems that execute on such computing devices are continuously improving their respective products to facilitate intuitive and convenient user interaction with such computing devices, operating systems, and applications that execute thereon. Conventionally, input devices, such as a keyboard and mouse, have been employed to receive input from a user, wherein the input is utilized to perform some computing operation. Accordingly, if the user wishes for the computing device, the operating system, and/or an application to perform a certain task, the user transmits instructions to the computing device by a series of mouse clicks, movements of the mouse, and/or keystrokes.
Recently, consumer-level computing devices have been equipped with technologies that facilitate more intuitive and convenient interaction therewith when compared to the aforementioned conventional user input devices. For example, many mobile telephones are equipped with touch-sensitive display screens, such that the user can interact with a graphical object on the display screen by way of contacting the display screen with one or more fingers and performing a gesture therewith relative to the graphical object. It can be readily ascertained, however, that gestures that can be recognized by a touch-sensitive display can be somewhat limited, as conventional touch-sensitive display screens do not support finger/hand disambiguation, and do not support depth recognition. Further, as a user must interact directly with the display screen, gestures are limited by the size of the display screen.
Recognizing gestures made by a user in three-dimensional space can expand an instruction set that can be set forth by a user to a computing device through such gestures. Conventional technologies for recognizing depth of an object (a human hand) relative to a reference point or plane (a particular point on a computing device or a display screen) is either too expensive to be practically deployed for mass production or lacks sufficient resolution to support recognition of relatively granular gestures. For example, types of technologies currently employed to perform three-dimensional depth recognition include binocular vision systems, structured light systems, and time of flight systems. Binocular vision systems compute depth of a point on an object by matching images from stereoscopically arranged RGB cameras. A deficiency commonly associated with binocular vision systems is the requirement that an object whose depth from a reference point is desirably ascertained must have a particular type of texture. Further, the resolution of a resultant depth image may be insufficient to allow for sufficiently accurate recognition of a granular gesture, such as slight motion of a finger.
Structured light systems use an infrared light source that irradiates a scene with patterns of infrared light, and depth of an object in the scene relative to the infrared light source is computed based upon deformations detected in such patterns in a captured infrared image. When generating a depth image, numerous pixels in the captured infrared image must be analyzed to recognize the pattern—thus, again, resolution of a resultant depth image may be insufficient to accurately recognize certain gestures. Time of flight systems include sensors that measure an amount of time between when infrared light is transmitted from an infrared emitter to when such light is received by a detector (after reflecting off an object in a scene). Such systems are currently prohibitively expensive to include in consumer-level devices; if less expensive sensors are employed, resultant depth images again may lack sufficient resolution to allow for accurate detection of granular gestures.