Disparity is a geometrical quantity associated with each point of a two-dimensional image to reflect a depth of the point of the scene in a three-dimensional representation.
If one considers an object-point observed by a left camera and a right camera which produce respectively a left image and a right image, the disparity is the difference between the position of this object-point in the left image and in the right image. It depends directly from the “depth” of the object-point, i.e. its distance to the cameras.
Disparity is expressed in an integer and fractional number of pixels. For images taken by two cameras having their axes converging in a plane at a distance zconv in meters from the cameras, the cameras having an inter-axis B in meters and having a focal distance F expressed in a number of pixels, disparity d is:d=F·B·[(1/zconv)−(1/z)]where z is the distance (or depth), in meters, from the cameras to the observed point.
Conventionally, disparities are negative for objects placed between the cameras and the convergence plane, zero if they are in the convergence plane, and positive if they are situated beyond this plane. If the cameras are parallel, disparities are negative; they tend towards zero at an infinite distance.
In a video sequence of images to be viewed in three dimensions, successive maps of disparities may be built from successive pairs of images produced by the two cameras; each map has one disparity value for each pixel of the image. As a matter of fact even two maps may be built for each pair: one reflects the disparity of the right image with respect to the left image, the other reflects the disparities of the left image with respect to the right image. During one single frame time of the video sequence, a computer analyses the images, and derives, from the discrepancies between the two images of the pair, a disparity value for each pixel. The computer tries to find in one image a destination pixel which corresponds to a given pixel in the other image, i.e. it finds the pixel of the second image (for instance the right image) which most presumably represents the same object as the given pixel in the first image (the left image). The computer does it for every single pixel and builds a map of disparities which has as many pixels as the images and in which the weight or amplitude for each pixel is the disparity of the pixel.
The disparity map is attached to a pair of images and will be used for restitution on a stereoscopic or auto-stereoscopic display.
The computation of disparities is not an exact and secure computation. Rather, it is an estimation. The reason is that it may be difficult to know exactly what is the destination pixel for a given pixel. It often happens that several pixels may be the destination pixel for a given pixel. The estimation is generally made by correlation of portions of images and the solution which is provided by the computer for each pixel is the solution that provides a maximum of likelihood, but there is no absolute certainty in it.
For instance, if the observed scene comprises a well characterized zone such as a square or a circle with very neat contours, the estimator will easily do the correlation between this shape in the left image and the same shape in the right image. However, for more complex portions of images, such as images with rather uniformly colored zones and no precise contour, it is much more difficult and the estimator will not be able to precisely determine what point corresponds to what point.
The error in disparity computation will not be so annoying if the images are static. The image seen in three dimensions may have false evaluations of depth but the viewer will see it without discomfort. It is much more annoying in a video sequence because the computer will do the estimation for each pair of images but may arrive to different estimations in different pairs of images even though the depth of the object in the real scene has not changed from a pair to the next successive pairs.
This variation in the estimations from image to image will result, at the time of reproduction on a display, in artifacts such as flickering or jittering, false colors, etc. They are uncomfortable for the viewer.