Computer vision is a growing research field that includes methods for acquiring, processing, analysing, and understanding images. The main driving idea in that field is to duplicate the abilities of the human vision system by electronically perceiving and understanding images of a scene. Notably, one theme of research in computer vision is the depth perception or, in other words, the three-dimensional (3D) vision.
For human beings, the depth perception is originated from the so-called stereoscopic effect by which the human brain fuses two slightly different images of a scene captured by the two eyes, and retrieves, among others, depth information. Moreover, recent studies have shown that the capacity to recognize objects in a scene greatly further contributes to the depth perception.
For camera systems, the depth information is not easily obtained and requires complex methods and systems. When imaging a scene, one conventional two-dimensional (2D) camera system associates each point of the scene with a given RGB colour information. At the end of the imaging process, a 2D colour map of the scene is created. A standard 2D camera system cannot recognize objects in a scene easily from that colour map as colour is highly dependent on varying scene illumination and as it does not intrinsically contain any dimensional information. New technologies have been introduced for developing computer vision and notably for developing 3D imaging, enabling in particular the direct capture of depth related information and the indirect acquisition of scene or object related dimensional information. The recent advancements in 3D imaging systems are impressive and have led to a growing interest from industry, academy and consumer society.
The most common technologies used to create 3D images are based on the stereoscopic effect. Two cameras take pictures of the same scene, but they are separated by a distance—exactly like the human eyes. A computer compares the images while shifting the two images together over top of each other to find the parts that match and those that mismatch. The shifted amount is called the disparity. The disparity at which objects in the image best match is used by the computer to calculate distance information, namely a depthmap, by using additionally camera sensors geometrical parameters and lens specifications.
Another more recent and different technology is represented by the Time-Of-Flight (TOF) camera system 3 illustrated in FIG. 1. TOF camera system 3 includes a camera 1 with a dedicated illumination unit 18 and data processing means 4. TOF camera systems capable of capturing 3D images of a scene 15 by analysing the time of flight of light from a light source 18 to an object. Such 3D camera systems are now used in many applications where depth or distance information measurement is required. Standard 2D camera systems, such as Red-Green-Blue (RGB) camera systems, are passive technologies, i.e. they use the ambient light to capture images and are not based on the emission of an additional light. On the contrary, the basic operational principle of a TOF camera system is to actively illuminate a scene 15 with a modulated light 16 at a predetermined wavelength using the dedicated illumination unit, for instance with some light pulses of at least one predetermined frequency. The modulated light is reflected back from objects within the scene. A lens collects the reflected light 17 and forms an image of the objects onto an imaging sensor 1. Depending on the distance of objects from the camera, a delay is experienced between the emission of the modulated light, e.g. the so called light pulses, and the reception at the camera of those light pulses. In one common embodiment, distance in between reflecting objects and the camera may be determined as function of the time delay observed and the speed of light constant value. In one another more complex and reliable embodiment, a plurality of phase difference in between the emitted reference light pulses and the captured light pulses may be determined and used for estimating depth information as introduced in Robert Lange phd thesis entitled “3D time-of-flight distance measurement with custom solid-state image sensors in CMOS/CCD technology”.
A TOF camera system comprises several elements, each of them having a distinct function.
1) A first component of a TOF camera system is the illumination unit 18. When using pulses, the pulse width of each light pulse determines the camera range. For instance, for a pulse width of 50 ns, the range is limited to 7.5 m. As a consequence, the illumination of the scene becomes critical to the operation of a TOF camera system, and the high speed driving frequency requirements for illumination units necessitate the use of specialised light sources such as light emitting diodes (LEDs) or lasers to generate such short light pulses.2) Another component of a TOF camera system is the imaging sensor 1 or TOF sensor. The imaging sensor typically comprises a matrix array of pixels forming an image of the scene. By pixel, it should be understood the picture element sensitive to light electromagnetic radiations as well as its associated electronic circuitry. The output of the pixels can be used to determine the time of flight of light from the illumination unit to an object in the scene and reflected back from the object to the imaging TOF sensor. The time of flight can be calculated in a separate processing unit which may be coupled to the TOF sensor or may directly be integrated into the TOF sensor itself. Various methods are known for measuring the timing of the light as it travels from the illumination unit to the object and from the object back to the imaging sensor.3) Imaging optics 2 and processing electronics 4 are also provided within a TOF camera system. The imaging optics are designed to gather the reflected light from objects in the scene, usually in the IR domain, and filter out light that is not in the same wavelength than the light emitted by the illumination unit. In some embodiments, the optics may enable the capture of infra-red illumination for TOF principle measurements and visible illumination for RGB colour measurements. The processing electronics drives the TOF sensor so as to, among several features, filter out light of frequencies different from the ones emitted by the illumination unit but having a similar wavelength (typically the sunlight). By filtering out unwanted wavelengths or frequencies, background light can effectively be suppressed. The processing electronics further include drivers for both the illumination unit and the imaging sensor so that these components can accurately be controlled in synchrony to ensure that an accurate image capture is performed and that a reliable depthmap of the scene is determined.
The choice of elements constituting a TOF camera system is crucial. TOF camera systems tend to cover wide ranges from a few millimetres up to several kilometres depending on the type and on the performances of the elements used. Such TOF camera systems may have distance accuracy varying from the sub-centimetres to several centimetres or even metres. Technologies that can be used with TOF camera systems include pulsed light sources with digital time counters, radio frequency (RF) modulated light sources with phase detectors, and range-gated imagers.
TOF camera systems suffer from several drawbacks. In current TOF imagers or TOF sensors, pixel pitches are usually ranging from 10 μm to 100 μm. Due to the novelty of the technology and to the fact that the architecture of a TOF pixel is highly complex, it is difficult to design a small pixel size while maintaining an efficient signal to noise ratio (SNR) and keeping in mind the requirement related to mass production at low cost. This results in relatively big chip sizes for TOF image sensor. With conventional optics, such big sizes of image sensor require large and thick optical stacks to fit onto the die. Generally, a compromise has to be found between required resolution and the thickness of the device to make it be embeddable on portable mass consumer product.
Furthermore, the depth measurement obtained by a TOF camera system may be erroneously determined for several reasons. Firstly, the resolution of such systems is to be improved. Big pixel size requires big sensor chip and thus the sensor resolution is limited by the TOF sensor size. Secondly, the accuracy in depth measurement of such systems still needs to be improved as, among a plurality of parameters, it is highly dependent on the Signal to Noise ratio and on the modulation frequency (the modulation frequency determining the depth accuracy and the operating depth measurement range). In particular, the uncertainty or inaccuracy in depth measurement may be due to an effect called “depth aliasing” which will be described in details later. Moreover, uncertainty can originate from the presence of additional light in the background. Indeed, the pixels of TOF camera systems comprise a photosensitive element which receives incident light and converts it into an electrical signal, for example, a current signal. During the capture of a scene, if the background light is too intense in the wavelength the sensor is sensitive to, then pixels may receive additional light not reflected from objects within the scene, which may alter the measured distance.
At present, in the field of TOF imaging, several options are available to overcome at least partially the major individual drawbacks the technology may suffer from, such as for instance, improved modulation frequency systems enabling more robust and accurate depth measurement, dealiasing or background light robustness mechanisms.
A solution remains to be proposed in to address these drawbacks together and to additionally improve the resolution of the TOF camera systems while limiting the thickness of the complete system and reducing parallax issues to make it compliant with mass-produced portable devices integration.