Imaging systems that optically examine a scene to discern target object(s) within, and then try to discern three-dimensional information as to the imaged scene and target object(s) are known in the art. Imaging systems typically employ an optical acquisition system to acquire images of a scene that may include at least one target object of interest, perhaps a human user or a portion of such user's body. In addition, imaging systems further include a processing system to process data acquired by the optical acquisition system, to discern desired three-dimensional information regarding the imaged scene.
In so-called time-of-flight (TOF) imaging systems the optical acquisition system emits optical energy whose return echoes are examined by a TOF camera system to acquire true three-dimensional data from the imaged scene. Exemplary TOF imaging systems were developed by Canesta, Inc. and are described in numerous patents to Canesta, Inc., now assigned to Microsoft, Inc. However TOF imaging systems can be expensive and may be unsuitable for battery operated portable use due to their large form factor and substantial operating power requirements.
Other imaging systems that employ two-dimensional optical acquisition systems are also known in the art. Such optical acquisition systems acquire two-dimensional image data that is processed to reconstruct three-dimensional image data. In some such systems the optical acquisition system includes at least two spaced-apart two-dimensional cameras. Exemplary such systems have been developed by Imimtek, Inc. (subsequently renamed Aquifi, Inc.) and are described in numerous patents assigned to Aquifi, Inc. of Palo Alto, Calif. The acquired two-dimensional data is processed such that a small number of landmark points sufficient to recognize an imaged target object are rapidly determined. Other less sophisticated two-camera imaging systems attempt to acquire stereographic two-dimensional images from which three-dimensional data can perhaps be discerned. But three-dimensional space/time reconstruction algorithms commonly used with such systems are not very useful when imaging dynamic scenes. This is because stereo matching must confront fundamental problems associated with triangulation, and more challengingly with correspondence estimation, which is associating points between images of the same scene acquired by the two spaced-apart two-dimensional cameras. Estimation of correspondences generally involves locally comparing one image in proximity to a specific point with the second image in proximity of any possible match. Local comparison is based on spatial image similarity, e.g., absolute difference. In practice, the imaged scene may change too fast for real-time stereo matching data to be computed.
In other so-called structured light imaging systems the optical acquisition system includes a generator that projects a calibrated pattern of light onto the imaged scene, and employs a pair of two-dimensional cameras that image the scene. Typically the light pattern is generated using a passive immutable diffractive optical element (DOE) that transforms an incoming optical wavefront into a desired but immutable (i.e., not changeable) output light pattern for projection onto the imaged scenery. DOES are diffraction-based and redirect light away from dark pattern regions, thus promoting efficiency and low power consumption.
In structured-light systems, the projected light pattern typically becomes altered when falling upon different surfaces in the imaged scene. For example a projected light pattern may appear distorted when projected onto differently shaped target object surfaces in the imaged scene, or may appear less focused and less intense when projected onto more distant regions of the imaged scene. The scene and projected light patterns is acquired by an optical acquisition system. Two-dimensional image data from the optical acquisition system is processed to determine surfaces and shapes of imaged object(s) that could produce the acquired observed light pattern distortion. Exemplary structured-light systems are described in patents to Prime Sense, Inc., now assigned to Apple, Inc. Some structured light systems employ the above-described space/time methodology by repeatedly computing the absolute difference for several acquisitions of the same scene on which different patterns are projected. But while this approach may work with fairly stationary images, it is difficult in practice to carry out real-time computations needed to reconstruct three-dimensional data where object(s) in the imaged scene are dynamic rather than stationary.
Structured light systems would benefit if projected patterns could somehow be changed dynamically in real-time. For example such dynamically changeable patterns could better accommodate target objects lacking suitable texture and/or shape to better enable a processing system to discern small shifts or disparities between frames of optical data acquired from at least two two-dimensional cameras in an optical acquisition system. Other projectable patterns might be useful to discern over a spatial dynamic range to more readily determine depth estimates to target objects that may be relatively close or far, or to more rapidly accommodate temporally rapidly changing target objects as opposed to less dynamically changing imagery. But while DOEs are robust, passive, and inexpensive to fabricate, in optical projection applications they are designed and fabricated to satisfy a specific optical energy input/output transfer function. In response to incoming optical energy, the DOE produces, or outputs, a single immutable pattern of structured optical energy in the so-called spatial frequency or holographic order space. However, the output pattern is immutable and cannot be changed without physically altering the internal construction of the DOE to alter its transfer function. In practice internally modifying a DOE on-the-fly to dynamically change its output pattern of optical energy is not possible.
One prior art approach to creating changing patterns of light projections on-the-fly might uses digital light processing (DLP) projection system, including MEMS digital micro-mirror devices (DMD). But in practice, DLP systems are not suited for battery operable mobile structured light systems. This is due to their relatively high cost, complex optics with resultant large form factor, high power consumption in the many watt range, and relatively narrow projection fields of view. Such prior art projectors redirect light rays onto a scene to generate bright pattern regions. But such projectors waste optical energy by redirecting light away from the scene onto a heatsink to generate dark pattern regions. This is very inefficient and wasteful of operating power, especially when compared to inexpensive, small form factor diffraction-based DOEs that merely redirect light away from dark pattern regions. Prior art projection systems incorporating liquid crystal on silicon projectors are also characterized by high energy losses. While DOEs operate over a more limited wavelength than projector-type devices, they provide a larger effective aperture and promote efficiency. By contrast substantial energy losses exit in other prior art projection technologies including liquid crystal on silicon projectors. In short, creating and projecting dynamically reprogrammable projection patterns for use in a low power consumption, inexpensive, small form factor system is not a trivial problem.
What is needed is a method and system whereby three-dimensional image data can be rapidly reconstructed for an optical acquisition system comprising two-dimensional cameras and a pattern generator. Three-dimensional reconstruction including space/time methods of three-dimensional reconstruction should work successfully even if the optical acquisition system images dynamic scenes including dynamic target object(s), and/or target objects that are relatively near or relatively far from the optical acquisition system, and/or target objects whose surface may be texturally unremarkable or even planar. Preferably such three-dimensional reconstruction should be implementable such that an overall system is inexpensive to fabricate, has small form factor and low power consumption enabling battery operation in mobile devices. Embodiments of such a system should be useful to scan a target object, and to recognize user gestures made by a target object.
The present invention provides such methods and systems.