Ranging devices are used for range sensing, i.e. measuring distances from these devices to objects or persons. Range sensing is also known as ranging, distance sensing or depth sensing. Certain ranging devices, sometimes referred to as imaging range sensors, can perform simultaneously multiple distance measurements in different directions within a certain solid angle or field of view (FOV). A set of distance values measured in this way can be converted into a 3-dimensional (3D) image of objects present in the FOV. This is why descriptions of imaging range sensors often resemble descriptions of more traditional cameras, producing 2-dimensional (2D) images that are relatively less informative about the shapes of imaged objects and more about their other properties, like color and reflectivity. Information provided by a traditional camera can be combined with distance data from a ranging device imaging the same FOV, to create color 3D images of objects or for other purposes. Cameras able to sense distances to objects simultaneously with the intensity and color of light coming from the objects are also known in the art.
Most of modern imaging devices use arrays of microscopic light sensors to capture images, i.e. patterns of light, produced by lenses, mirrors or similar optical, light-gathering components. Each individual light sensor, also known as a pixel, responds to light directed onto it by the optical component, the light that ultimately comes from a certain narrow solid angle within the field of view (FOV) of the optical component. This narrow solid angle can be called the acceptance angle or FOV of the pixel. The FOV of an imaging device is simply a sum of the non-overlapping fields of view of all its active pixels. A pixel's response to light involves absorption of photons, resulting in generation of electrically charged photoelectrons that are then captured in a potential well. The charge accumulated in the well over a certain period of time called “integration time” or “exposure time” is ultimately converted, by dedicated electronic circuits located inside and/or outside the pixel, to a digital value that is read out of the imaging device, and can be stored in a digital memory. A set of such digital values, produced roughly simultaneously by different light-sensing pixels—and therefore representing contemporaneous values of light intensity at different points of the light pattern projected onto the pixel array of an imaging device—is called a digital image, with the individual digital values being called image pixels.
The traditional use of the word “pixel” (a contraction of “picture element”) to signify both a basic component of a digital image and a microscopic light sensor in an imaging device often results in these two distinct concepts being confused—or, at least, being thought of as related in a very straightforward way. However, the relation between an array of light-sensing pixels and a digital image ultimately obtained from it can be quite complicated. A digital image consisting of simple “readings” of light intensity from different light-sensing pixels, and containing exactly one “reading” from each pixel, is only the simplest possibility, a data set often called a “raw image” or “raw frame”. One such raw image or a batch of raw images read out sequentially from an imaging device can be subjected to digital processing that will, firstly, destroy the one-to-one correspondence between the pixels of the resulting image and the light-sensing pixels of the imaging device, and, secondly, make the image pixels represent something other than light intensity at different locations. A good example of the image processing that does both is the color interpolation or demosaicing that coverts raw images obtained from image sensors with color filters on their pixels (typically arranged in so-called Bayer pattern) to digital images in which each pixel is a three-component vector representing light intensity and color (or luminance and chrominance; luma and chroma for short). A ranging imaging device may use light-sensing pixels of the same design as a color image sensor, but its ultimate output is images consisting not of luma-chroma vectors, but of distance data, obtained by processing raw pixel readouts in a different way than that used to produce digital color images. These distance data can be structured into so called depth images, in a raster or vector form. In a vector depth image, every image pixel can usually be expected to represent a direction and measured distance to some point on the surface of some object—in other words, the position of such a point in some coordinate system. In raster depth images, one should generally expect to find both pixels representing successful attempts to measure distance in specific directions and pixels representing failed attempts. The latter may be present due to, for example, the absence of sufficiently near and sufficiently reflective objects in certain parts of the ranging device's FOV.
Ranging is often performed using different time-of-flight (T-O-F) techniques. The most popular of these involves illuminating objects in the FOV of a ranging device with a beam of light of periodically modulated intensity, and having the ranging device capture a series of “samples” of reflected light intensity at different points of its modulation period. An “intensity sample” is actually an analog or digital signal proportional to a finite time integral of the product of the oscillating light intensity and a certain co-periodic “demodulation function” or “gating function”. This demodulation function makes the integral representative of the light intensity at a certain point of—or within a certain fraction of—the intensity modulation period, by giving to that favored part of the period a higher weight in the integral than to the rest of the period. The higher the ratio of the weight of the favored part to the weight of the remainder of the period, (often referred to as demodulation contrast), the smaller the favored part of the period, and the lower the number of periods in the integration time, the more accurate the description of the integral as an intensity sample is. The description is completely accurate only in the ideal, unrealistic case when the demodulation function is a Dirac delta function. The first step away from this ideal is to repeat the sampling by the delta function at a specific modulation phase over a number of modulation periods, to improve the signal-to-noise ratio (SNR) of the resulting multi-period integral. This only works if the intensity modulation is constant and the intensity noise is random throughout the integration time. The situation does not change when the delta function is replaced by any demodulation function that is less extreme in giving unequal weights to different light modulation phases. The only effect of this change is to weaken the phase-specificity of any single- or multi-period integral of the product of the function and light intensity.
Intensity of the reflected light at the imaging device can be sampled at different phases of its oscillation by changing the phase difference between the demodulation function and known modulation of light intensity at the light source. The changing of this modulation-demodulation phase difference or shift can be done by changing the phase of the demodulation function or the phase of the light intensity modulation, whichever is most convenient. After collecting at least three light intensity samples, separated by sufficiently large phase shift intervals and having sufficient signal-to-noise ratios, one can determine, using computation methods known in the art, the modulation phase difference between the light reaching the ranging device and the light at its source. This phase difference is a result of the light having traveled a distance from the source to the ranging device. Up to a certain maximum distance, proportional to the light modulation period, the phase difference is proportional to the distance traveled by the light. If the source of the modulated light is close to the ranging device, a reflection of the light from any object into the ranging device results in a travel distance close to twice the distance from the device to the object. Thus, distances to different objects can be readily obtained from phase shifts of modulated light beams reflected from those objects.
For a demodulation function to be present in the light intensity integrals captured by a T-O-F ranging device, the device must include some demodulating component or demodulator, permitting the integration of modulated light over a certain fraction of every modulation period, and preventing or inhibiting the integration for the remainder of the period. An imaging range sensor may have pixels designed to perform both the demodulation and integration of the incident light. However, combining these two functions in very small pixels is difficult, which limits the spatial resolution of sensors with demodulating pixels. Moreover, the effectiveness of demodulation in CMOS pixels drops sharply if the inverse of the period of the demodulation function, called demodulation frequency, is pushed above 20 MHz.
Instead of using a specially designed sensor with demodulating pixels, a T-O-F 3D imaging device can use an ordinary image sensor with a separate demodulator, in the form of an optical gate or shutter whose transmission oscillates at a high frequency. The transmission of the shutter is the demodulation function in this case. This function can usually be considered “global”, i.e. the same for all pixels of the sensor behind the shutter, at least in terms of having the same period and phase of oscillation, if not necessarily the same oscillation amplitude and midpoint. While the light input to all the pixels is controlled by the global demodulation function, the pixels generally do not have to integrate light in complete synchrony, i.e. start and end the integration all at the same time and integrate as much of the light input as permitted by the demodulation function. For example, different rows of pixels may start and end the integration at different times, while sharing the length of the integration time. This is equally possible for the demodulating pixels mentioned earlier. The usual reason for making the timing of light integration pixel-position-dependent or local in this way is to enable reading and outputting of the resulting pixel data at a steady, maximized data rate. When the timing of light integration in pixels is local, the demodulating pixels have an advantage over a global demodulator, in that the necessary changes in phase shifts of the pixels' demodulation functions can also have local timing, and it can be selected so that no local phase shift change ever coincides with light integration at the same location. The requirement to make this possible is to be able to change the phases of the demodulation functions of the pixels, while keeping the phase of the light intensity modulation constant.
Avoiding changes of demodulation function phase shift during light integration is necessary for the resulting intensity integral to be usable as a sample of light intensity at a certain phase of its modulation. Formulae for computing distance from 3 or 4 such “pure one-phase intensity samples” are simple, well known in the art, and widely used in T-O-F ranging systems. Also, analysis of noise or error propagation through these formulae is quite straightforward. All this makes light integration at a constant demodulation function phase shift preferable to other possibilities. However, holding on to this preference in an T-O-F depth imaging system combining a global demodulation function (provided, for instance, by an optical shutter) with a local light integration timing (e.g. the timing scheme known in the art as Electronic Rolling Shutter or ERS) may require an inefficient use of pixels, in the sense that successive light integrations in a particular pixel producing the “pure one-phase light intensity samples” may have to be separated by long periods of idle waiting for other pixels to complete their similar integrations. The integration timing being local implies that, for every light integration in a particular pixel, at least one integration takes place elsewhere that is of the same duration, and requires the same phase shift of the global demodulation function, but differs in terms of start time and end time. From this, it follows that the time intervals between changes of the demodulation function phase shift must be longer than the duration of any particular integration. These time intervals are also intervals between the starts of consecutive integrations in any particular pixel, because each of these integration should be done at a different demodulation function phase shift. If a pixel integrates light over significantly less than 100% of every interval between two consecutive integration starts, the pixel is used inefficiently.
Using an off-the-shelf ERS sensor behind a fast optical shutter in a T-O-F 3D camera has some advantages over using a specialized depth sensor with demodulating pixels. These advantages include lower sensor cost, smaller pixel size and higher pixel count, and will likely include in the future a higher maximum attainable modulation/demodulation frequency. A number of disadvantages of using an optical shutter or other global demodulator can also be listed. Some of the disadvantages are unique to the combination of a global demodulator with local integration timing exemplified by the integration timing in ERS sensors. The main disadvantage of this combination is its inability to produce raw frames consisting of “pure one-phase light intensity samples” at a maximum frame rate permitted by the ERS timing scheme. A high frame rate is a highly desirable characteristic of every T-O-F 3D camera that does sequential capture of light intensity samples, because the camera's ability to accurately image moving objects grows with the frame rate. A T-O-F camera with global demodulator can only capture intensity samples sequentially. Demodulating pixels can be designed and operated in a number of ways permitting every pixel to capture simultaneously 2 or more different-phase intensity samples. When the capture of different-phase intensity samples is partly or fully parallelized in demodulating pixels, a T-O-F 3D camera using these pixels usually images moving objects better than a camera using a fully sequential sample capture to produce depth images at the same frame rate. Correspondingly, there is more need to maximize the frame rate in T-O-F cameras with global demodulators, including those cameras that also use the ERS integration timing scheme.