Determination of distance to objects based on images captured by an image capture device such as a still or moving image camera has become of increasing importance.
For example, with the advent of autonomous systems capable of adapting functionality and characteristics to their environment, information about the presence of humans and their precise location is often highly desirable. Examples include monitoring systems for elderly or hospital wards, adaptive lighting systems capable of dimming locally to save energy or change beam shapes and spectral signatures to increase comfort. The precise position of human eyes, in particular, can be of interest for display purposes where the view provided is adapted to the detected eye position, for example to provide optimal views to both eyes for 3D imaging or to create a private view in the direction of the nearest pair of eyes while shielding the view in other directions.
It has been proposed to determine scene depth information by locally analyzing the defocus blur since this is dependent on the distance from the lens to the object. However, conventional lens apertures introduce defocus blur which is mathematically irreversible and not very discriminative to depth. Accordingly such approaches tend to be complex and result in relatively unreliable depth data.
In the article “Image and Depth from a Conventional Camera with a Coded Aperture”, by A. Levin, R. Fergus, F. Durand, W. T. Freeman; SIGGRAPH, ACM Transactions on Graphics, August 2007 it has been proposed to alleviate this shortcoming by introducing a coded aperture in the lens of an image sensor thereby shaping the spectral properties of the resulting defocus blur. This system is based on a coded aperture having a broadband pattern and a statistical model of images to recover depth and reconstruct an all-focus image of the scene. However, the approach results in a system with high complexity requiring a high computational resource. Furthermore, the approach may result in depth measurements that are less reliable and accurate than desired. Indeed, the method is particularly difficult in the presence of depth contrasts for objects with similar intensities. In this case, the resulting images represent a superposition of two differently blurred planes.
Hence, an improved system for determining distances to objects based on an image from an image capturing device would be advantageous and in particular a system allowing for increased flexibility, reduced complexity, reduced resource usage (in particular reduced computational resource usage), improved distance determination and/or improved performance would be advantageous.