The physical world is full of three-dimensional objects, and there is a growing interest in portable cost effective systems and methods to produce depth data and depth-based information from such objects. Such objects or target objects in an imaged scene may include portion(s) of a human being. Without limitation if accurate depth data can be acquired, such data can assist in computational photography, implementing three-dimensional filters, scene layering, object tracking, motion tracking, three-dimensional scanning, homography, simultaneous localization and mapping (SLAM), dense tracking and mapping (DTAM), localization, measurement, metrology, and recognition of human made gestures.
Imaging systems that optically examine a scene to discern target object(s) within, and then try to discern three-dimensional information as to the imaged scene and target object(s) are known in the art. Imaging systems typically employ an optical system to acquire images of a scene that may include at least one target object of interest, perhaps a human user or a portion of such user's body. In addition, imaging systems further include a processing system to process data acquired by the optical acquisition system, to discern desired three-dimensional information regarding the imaged scene. As described herein, camera(s) used in the optical acquisition system may include ordinary color RGB cameras, depth or range cameras, or a combination of both, sometimes referred to as RGB-D, where D denotes a depth image where each pixel in the camera pixel sensor array encodes the z-axis depth (or distance) information of the imaged scene. In brief, the depth image can be obtained by different methods including geometric or electronic. Examples of geometric methods include passive or active stereo camera systems and structured light camera systems. Examples of electronic methods to capture depth image include Time of Flight (TOF), or general scanning or fixed LIDAR cameras.
Some known three-dimensional depth sensors use a companion projection technique to assist in three-dimensional reconstruction. These approaches may include projecting one or more encoded patterns (typically used in the so called structured-light methods), projecting a pattern to create texture on the scene, or projecting a pattern that is optimized for three-dimensional reconstruction. The latter two techniques may be used in systems with two or more cameras (e.g., stereoscopic systems). The advantages of using two or more cameras as opposed to just one camera, as is generally the case in structured-light methods, include robustness against deviations in the projection pattern from the ideal design specification and the ability to operate in high ambient light situations where the projection pattern cannot be distinguished by the camera system.
In so-called time-of-flight (TOF) imaging systems the optical acquisition system emits optical energy whose return echoes are examined by a TOF camera system to acquire true three-dimensional data from the imaged scene. Exemplary TOF imaging systems were developed by Canesta, Inc. and are described in numerous patents to Canesta, Inc., now assigned to Microsoft, Inc. However, TOF imaging systems can be expensive and may be unsuitable for battery operated portable use due to their large form factor and substantial operating power requirements.
Other imaging systems that employ two-dimensional optical acquisition systems are also known in the art. Such optical acquisition systems acquire two-dimensional image data that is processed to reconstruct three-dimensional image data. Exemplary such systems in which the optical acquisition system includes at least two spaced-apart two-dimensional cameras have been developed by Imimtek, Inc. (subsequently renamed Aquifi, Inc.) and are described in numerous patents assigned to Aquifi, Inc. of Palo Alto, Calif. The acquired two-dimensional data is processed such that a small number of landmark points sufficient to recognize an imaged target object are rapidly determined. Other less sophisticated two-camera imaging systems attempt to acquire stereographic two-dimensional images from which three-dimensional data can perhaps be discerned.
But three-dimensional space-time reconstruction algorithms commonly used with such systems are not very useful when imaging general dynamic scenes. This is because stereo matching must confront fundamental problems associated with triangulation, and more challengingly with correspondence estimation, which is associating points between images of the same scene acquired by the two spaced-apart two-dimensional cameras. Estimation of correspondences generally involves locally comparing one image in proximity to a specific point with the second image in proximity of any possible match. Local comparison is based on spatial image similarity, e.g., absolute difference. In practice, the imaged scene may change too fast for real-time stereo matching data to be computed.
In other so-called structured light imaging systems, the optical acquisition system includes a pattern generator that projects a predefined pattern of light onto the imaged scene, and employs a pair of two-dimensional cameras that image the scene. Typically the light pattern is generated using a passive immutable diffractive optical element (DOE) that transforms an incoming optical wavefront into a desired but immutable (i.e., not changeable) output light pattern for projection onto the imaged scenery. DOEs are diffraction-based and redirect light away from dark pattern regions, thus promoting efficiency and low power consumption.
In structured-light systems, the projected light pattern typically becomes altered when falling upon different surfaces in the imaged scene. For example, a projected light pattern may appear distorted when projected onto differently shaped target object surfaces in the imaged scene, or may appear less focused and less intense when projected onto more distant or less reflective regions of the imaged scene. The scene and projected light patterns is acquired by an optical acquisition system. Two-dimensional image data from the optical acquisition system is processed to determine surfaces and shapes of imaged object(s) that could produce the acquired observed light pattern distortion. Exemplary structured-light systems are described in patents to Prime Sense, Inc., now assigned to Apple, Inc. Some structured light systems employ the above-described space/time methodology by repeatedly computing the absolute difference for several acquisitions of the same scene on which different patterns are projected. But while this approach may work with fairly stationary images, it is difficult in practice to carry out real-time computations needed to reconstruct three-dimensional data where object(s) in the imaged scene are dynamic rather than stationary.
Structured light systems would further benefit if projected patterns could somehow be changed dynamically in real-time. For example such dynamically changeable patterns could better accommodate target objects lacking suitable texture and/or shape to improve the ability of a processing system to discern small shifts or disparities between frames of optical data acquired from at least two two-dimensional cameras in an optical acquisition system. Other projectable patterns might be useful to discern over a spatial dynamic range to more readily determine depth estimates to target objects that may be relatively close or far, or to more rapidly accommodate temporally rapidly changing target objects as opposed to less dynamically changing imagery. But while DOEs are robust, passive, and inexpensive to fabricate, in optical projection applications they are designed and fabricated to satisfy a specific optical energy input/output transfer function. In response to incoming optical energy, the DOE produces, or outputs, a single immutable pattern of structured optical energy in the so-called spatial frequency or holographic order space. However, the output pattern is immutable and cannot be changed without physically altering the internal construction of the DOE to alter its transfer function. In practice, internally modifying a DOE on-the-fly to dynamically change its output pattern of optical energy is not possible. U.S. Pat. No. 9,325,973 (2016) entitled Dynamically Reconfigurable Optical Pattern Generator Module Useable With a System to Rapidly Reconstruct Three-Dimensional Data, assigned to Aquifi, Inc. (assignee herein) describes a modern such system in which a second DOE is dynamically moveable with respect to a first DOE to intelligently reconfigure the optically projected pattern.
Another prior art approach to creating changing patterns of light projections on-the-fly might uses a digital light processing (DLP) projection system, including micro-electro-mechanical systems (MEMS) digital micro-mirror devices (DMD). But in practice, DLP systems are not suited for battery operable mobile structured light systems due to their relatively high cost and multi-watt power consumption, complex optics with resultant large form factor and relatively narrow projection fields of view. Such prior art DLP projectors redirect light rays onto a scene to generate bright pattern regions. But much optical energy is inefficiently dissipated by being redirected onto a heat sink and away from the scene, to produce dark pattern regions. By contrast, prior art DOEs have a much more compact form factor, and are more efficient in that they merely direct light away from dark pattern regions in the scene. Some prior art projection systems incorporate liquid crystal-on-silicon projectors, which like many projector type devices may operate over a larger wavelength compared to DOEs. But such larger wavelength operable projection systems are characterized by high energy losses.
What is needed is a method and system whereby three-dimensional image data can be rapidly reconstructed for an optical acquisition system comprising two-dimensional cameras and a pattern generator system using a single DOE. Such system and methods should enable three-dimensional reconstruction including use of so-called space-time methods of pattern generation and three-dimensional reconstruction. Such systems and methods should function well even if what is imaged by the optical acquisition system includes dynamic scenes including dynamic target object(s), and/or target objects that are relatively near or relatively far from the optical acquisition system, and/or target objects whose surface may be texturally unremarkable or even planar or are dimly lit by ambient light. Preferably embodiments of such methods and systems should be useful to scan a target object, and in some embodiments, carry out recognition of user gestures made by a target object. Such systems and methods should be implementable with small form factor perhaps even a wearable form factor, with efficient low power consumption, and should include a pattern generator system that is dynamically reprogrammable to project patterns most suitable to the scene currently being imaged, including so-called space-time patterns.