Electronic imaging sensors usually have an array of m×n photo-sensitive pixels, with m>=1 rows and n>=1 columns. Each pixel of the array can individually be addressed by dedicated readout circuitry for column-wise and row-wise selection. Optionally a block for signal post-processing is integrated on the sensor.
The pixels typically have four basic functions: photo detection, signal processing, information storage, and analog or digital conversion. Each of these functions consumes a certain area on the chip.
A special group of smart pixels, called demodulation pixels, is well-known for the purpose of three dimensional (3D) time of flight (TOF) imaging. Other applications of such demodulation pixels include fluorescence life-time imaging (FLIM). The pixels of these demodulation imaging sensors typically demodulate the incoming light signal by means of synchronous sampling or correlating the signal. Hence, the signal processing function is substituted more specifically by a sampler or a correlator. The output of the sampling or correlation process is a number n of different charge packets or samples (A0, A1, A3 . . . An−1) for each pixel. Thus, n storage sites are used for the information storage. The typical pixel output in the analog domain is accomplished by standard source follower amplification. However, analog to digital converters could also be integrated at the pixel-level.
The image quality of demodulation sensors is defined by the per-pixel measurement uncertainty. Similar to standard 2D imaging sensors, a larger number of signal carriers improves the signal-to-noise ratio and thus the image quality. For 3D imaging sensors, more signal carriers mean lower distance uncertainty. In general, the distance measurement standard deviation a shows an inverse proportionality either to the signal A or to the square root of the signal, depending whether the photon shot noise is dominant or not.
  σ  ∝      1          A      
if photon shot noise is dominant
  σ  ∝      1    A  
it other noise sources are dominant
A common problem for all demodulation pixels used in demodulation sensors, such as for TOF imaging or FLIM, or otherwise, arises when trying to shrink the pixel size to realize arrays of higher pixel counts. Since the storage nodes require a certain area in the pixel in order to maintain adequate full well capacity and thus image quality, the pixel's fill factor suffers from the shrinking process associated with moving to these larger arrays. Thus, there is a trade-off between the storage area needed for obtaining a certain image quality and the pixel's photo-sensitivity expressed by the fill-factor parameter. In the case of a minimum achievable image quality, the minimum size of the pixel is given by the minimum size of the total storage area.
In 3D imaging, typically a few hundreds of thousands up to several million charge carriers, i.e. typically electrons, need to be stored in order to achieve centimeter down to millimeter resolution. This performance requirement, in turn, means that the storage nodes typically cover areas of some hundreds of square micrometers in the pixel. Consequently, pixel pitches of 10 micrometers or less become almost impossible without compromises in terms of distance resolution and accuracy.
The aforementioned trade-off problem becomes even more critical if additional post-processing logic is to be integrated on a per-pixel basis. Such post-processing could include for example analog-to-digital conversion, logic for a common signal subtraction, integrators, and differentiators, to list a few examples.
Another challenge of the demodulation pixels is the number of samples required to unambiguously derive the characteristics of the impinging electromagnetic wave. Using a sine-modulated carrier signal, the characteristics of the wave are its amplitude A, the offset B and the phase P. Hence, in this case, at least three samples need to be acquired per period. However, for design and stability reasons, most common systems use four samples. Implementing a pixel capable of capturing and storing n=4 samples requires in general the four-fold duplication of electronics per pixel such as storage and readout electronics. The result is the further increase in the electronics per pixel and a further reduction in fill factor.
In order to avoid this loss in sensitivity, most common approaches use so-called 2-tap pixels, which are demodulation pixels able to sample and store two samples within the same period. Such type of pixel architectures are ideal in terms of sensitivity, since all the photo-electrons are converted into a signal and no light is wasted, but on the other hand, it requires at least two consequent measurements to get the four samples. Due to sampling mismatches and other non-idealities, even four images might be required to cancel or at least to reduce pixel mismatches. Such an approach has been presented by Lustenberger, Oggier, Becker, and Lamesch, in U.S. Pat. No. 7,462,808, entitled Method and device for redundant distance measurement and mismatch cancellation in phase measurement systems, which is incorporated herein by this reference in its entirety. Having now several images taken and combined to deduce one depth image, motion in the scene or a moving camera renders artifacts in the measured depth map. The more those different samples are separated in time, the worse the motion artifacts are.
A new architecture has been disclosed by Oggier and Buettgen in U.S. Pat. Pub. No. 2011/0164132A1. The architecture enables the shrinking of the pixel size without significantly reducing the pixel's fill factor and without compromising the image quality due to smaller storage nodes. The solution even provides the possibility for almost arbitrary integration of any additional post-processing circuitry for each pixel's signals individually. Furthermore, it can reduce the motion artifacts of time-of-flight cameras to a minimum. Specifically, this demodulation sensor comprises a pixel array comprising pixels that each produce at least two samples and a storage or proxel array comprising processing and/or storage elements, each of the storage elements receiving the at least two samples from a corresponding one of the pixels. The pixels comprise photosensitive regions in which incoming light generates charge carriers and demodulators/correlators that transfer the charge carriers among multiple storage sites in the pixels. A transfer system is provided that transfers the samples generated by the pixels to the corresponding storage elements of the proxel array. In example embodiments, the transfer system analog to digitally converts the samples received by the storage elements. The proxel array then accumulates multiple subframes in time until the entire frame is readout from the proxel array.