This present disclosure relates to an image processing system and methods.
In stereo vision, a three dimensional model of a scene is reconstructed from two images that are taken some distance apart by essentially identical rectilinear (i.e. distortion free) cameras. The separation distance, and direction, is known as the stereo baseline. In the most common case, the different images are created by moving a single camera from one location to another, separated by the stereo baseline.
Because the camera moves between exposures, the objects in the scene will show relative motion with respect to each other. This apparent parallax image motion is a function of, among other things, the distance of the objects from the stereo baseline. With respect to parallax, the further away an object is the smaller will be the true parallax motion. By measuring these true parallax motions for multiple objects it is possible to reconstruct their three dimensional relationships within the natural scene.
If the camera is moved from the first position to the second without any change in its orientation, then the apparent parallax motions remain well behaved and three dimensional scene reconstruction is relatively easy. However, if the camera rotates when moved between the two positions the parallax motion may become twisted into a swirl pattern, as shown in FIG. 1. The apparent motions of features from the first to the second image are shown as “flow lines” in FIG. 1. Three dimensional reconstruction can be very difficult in this circumstance.
In practice, the camera pan, tilt and roll angles have coupled impacts on the imagery. For example, camera pan and tilt will produce image rotations which emulate roll. These couplings are generally difficult to untangle. FIG. 1 shows these rotation coupling effects as well as the effects of the translation of the camera in the usual three spatial dimensions.
All together there are nine degrees of freedom for the two camera position system: pan, tilt and roll for each camera position, and x, y, z relative camera translation. In many cases the distance that the camera has traveled is not known. Also, the combined tilt of the cameras with respect to horizontal may not be known either. Thus, there may be from six to nine degrees of freedom that could be discovered during the course of the stereo imaging.
Existing methods for extracting three dimensional information from arbitrary pairs of images of a scene assume that the initial camera poses (pan, tilt, roll and x, y, z position) are unknown but may be extracted from the imagery. These existing methods are generally based on the construction and refinement of the Essential Matrix, which can be used to determine the direction of the stereo baseline and the relative orientations of the cameras with respect to the stereo baseline. The relative camera orientation is defined as the camera pose. The Essential Matrix is one of the most useful mathematical tools for creating stereo models from pairs of images.
However, these existing methods generally require extensive non-real-time computation and often produce unsatisfactory results. Furthermore, these methods are not satisfactory for forming a high precision and fully automated real time three dimensional model from video image streams.