Time-of-flight (ToF) cameras capture scene depth by measuring a phase delay of a modulated signal emitted by an infrared LED flash towards the scene in the space. ToF cameras have become a popular choice for many computer vision applications including human-computer interaction, 3D reconstruction, and object detection. ToF cameras have the advantages of low cost, high speed, and compact form. However, compared to 3D sensors based on laser or structured light, ToF cameras are noisier. Noise in the ToF camera can be classified into two major categories: scene independent noise and scene dependent noise.
The scene independent noise comes from the limitation in manufacturing the ToF camera hardware including the infrared emitter (non-ideal sinusoidal modulation), the sensor (CMOS gate difference for each pixel), and the optics. The limitation results in measurement bias, which depends on the pixel location, the measured range value, and the measured amplitude value.
The scene dependent noise is the result of the multipath behavior of the flash light. It leads to the distortion of:                1) range over-shooting due to the superposition of the reflection lights from neighboring structure; and        2) range smoothing due to the superposition of the reflection signals from foreground and background regions.        
Most of the existing methods for reducing the scene independent noise use a global calibration model where the same parametric model is applied to all the pixels in the image. The global calibration model has a small number of parameters and hence requires a small amount of data for fitting. However, the global calibration model fails to model the pixel location dependent bias in the ToF range image, which limits an achievable accuracy.
Simulation using ray tracing has been used for reducing the scene-dependent bias. Several methods exploit multiple modulation frequencies to achieve the same goal. Simulation-based methods require good initial range estimates and are generally slow. Multiple modulation frequencies-based methods require special hardware and are inapplicable to off-the-shelf ToF cameras where only a single modulation frequency is available. A stereo ToF camera can improve the range measurements by modeling occlusions. However, stereo imaging requires two ToF cameras and a baseline.
Several methods aim to enhance or upsample the ToF range images using only the range images or jointly with high-quality color images. Those methods effectively remove random noises but cannot correct measurement biases and retrieve fine structure since those methods do not model the ToF image property.
Deep neural network methods can be used for object recognition tasks, and other image processing tasks, such as image denoising and superresolution. However, those methods are not directly applicable to denoising a ToF range image since the ToF range image has different characteristics than a conventional optical image.