Depth sensing may be utilized by computing devices to support a variety of different functionality. For example, conventional techniques such as of time-of-flight cameras, structured-light cameras, and so on may be used to determine a location of an object within an object scene in two-dimensional space (i.e., “X” axis and “Y” axis) as well as a depth of the object from a camera that captures the image, i.e., a “Z” axis. This may be used to map an object space in three dimensions, detect gestures as part of a natural user interface (NUI), and so forth.
There are primarily two kinds of imaging techniques utilized for generating depth images—passive and active. Passive depth cameras are reliant on the amount of texture information present in the scene, and it is challenging to achieve reliable performance in diverse environments. Active depth cameras, on the other hand, have significantly higher reliability as these cameras estimate depth information by measuring response to a light source that is part of the system. Conventional time-of-flight techniques utilized to perform depth sensing, however, require high powered and high frequency illumination and are susceptible to multipath degradation, as these techniques are based on timing of reflected light back to a custom image sensor. The customization of the sensor also adds significant cost to the device. Conventional structured-light cameras, on the other hand, require calibration against a diffraction optical element (DOE) and there are stringent requirements on maintaining this calibration through the life of the system for guaranteeing correct depth. This calibration requirement makes it hard to implement structured light depth systems in products that might undergo significant mechanical and thermal distortions—such as mobile products. Additionally, the depth spatial resolution is dependent on the spacing of the dots in the DOE pattern, and there is often a tradeoff between resolution and being able to identify each dot from a very dense pattern. Depth from defocus is another technique that makes use of known lens properties to infer depth based on the amount of defocus/blur rendered by scene points, as such blur is dependent on depth and lens properties, such as depth of field in object space and depth of focus at sensor. This technique can be implemented in both, active and passive flavors, with the active mode (say, with an illumination systems including a laser and DOE) having the advantage that there will be no dependence on the amount of scene texture, as it is added or overlaid onto the object scene using such structured light illuminator. The challenge for this technique has been that in order to achieve good depth accuracy, a large aperture lens is required which limits viability in mobile products, due to factors including size and weight as well as cost.