1. Field of the Invention
Embodiments of the invention described herein pertain to the field of machine vision systems. More particularly, but not by way of limitation, these embodiments yield improved calculation of distance in environments comprising substantially horizontal and substantially vertical features through use of stereo digital cameras that are rotated in at least one axis comprising at least the roll axis.
2. Description of the Related Art
Machine vision systems allow computers to view the physical world. A machine vision system comprises at least one camera coupled with a computer. A computer is used to interpret an image taken from a camera thereby enabling a machine vision system to perform various tasks. Tasks performed by machine vision systems are diverse and include distance estimation that is used in applications involving robot navigation. The use of two cameras in order to calculate a distance to an object is known as binocular or stereo machine vision. Because of their inexpensive price and richness of data, CMOS and CCD cameras are used for machine vision applications such as robot navigation that make use of a three dimensional image of an object or an environment in which a robot is situated.
Sensors such as ultrasonic, radar and lidar are used to actively sense the environment. Active sensors transmit a signal and analyze the reflection of that signal. Cameras are passive sensors that require a more intricate analysis of data obtained from the camera to map an image as compared to active sensors. According to Computer Vision, Three-Dimensional Data from Images by Klette, Schluns and Koschan, binocular stereo vision is a process that transforms two images seen from slightly different viewpoints into a perception of the three-dimensional space. Hence, the use of stereo digital cameras is of great interest for machine vision systems.
Stereo machine vision, or stereovision, involves the use of two or more cameras separated from each other to view an object or environment. Features comprise points on objects, edges or other visible markings. Features as seen by digital cameras are located in different relative positions in the images, depending on their orientations and distance from the cameras. The difference of a feature's location in two images is called the feature's pixel disparity or disparity. The position of a feature in three-dimensional real world coordinates is determined by the feature's disparity and the camera specifications and geometry.
Two key technical aspects of stereovision analysis techniques involve methods to determine the points in two images that correlate with one another and to determine where the point is in the physical world with as much accuracy as possible.
There are many known methods for matching features between images. A feature is otherwise known as a point of interest. Example methods for matching points of interest include pixel-by-pixel correspondences and disparities; image patch correlation that divides one image into rectangular patches of pixels and then searches for similar patches in the other image; shading and gradient analysis; edge detection and matching; and object matching. Various combinations of these approaches can also be used. Once features are matched, the feature disparities can be calculated. There are many texts that describe the geometry to determine the position of a feature based on the disparity between the images.
As the measured accuracy of the stereo geometry or the feature's pixel disparity decreases, so does the accuracy of the relative position of the feature in three-dimensional space. Any feature in one image that can be matched with several features in the other image is problematic and either must be ignored or leads to low accuracy for the estimate of the feature's three-dimensional position. It is therefore desirable to minimize the number of this type of feature that appears in typical environments.
The type of feature that is the most problematic is any line that is parallel to the axis defined in the direction between the camera centers. This is because every portion of the line in the first image matches every portion of the line in the second image equally well so the match is completely ambiguous and unusable. Lines that are not quite parallel to the line between the camera centers are also problematic. While there is a theoretical best match, slight problems such as lighting discontinuities render these lines that are close to parallel unusable. It is easy to mismatch lines that are nearly parallel to the cameras and such a mismatch results in a feature location estimate which is erroneous which is worse than not using the feature location estimate at all.
Most stereo camera systems consist of two horizontal coplanar cameras. Vertical coplanar cameras also exist but are less common. Researchers have also experimented with “Trinocular” systems, stereovision using three cameras. In these systems, the cameras are typically mounted on the same plane either with all three cameras mounted along one axis or in a right angle configuration with two cameras mounted side-by-side and the third camera mounted vertically above one of the other two.
These vertical and horizontal mounting configurations are the standard used in all machine vision systems. In addition to providing the simplest geometry, these configurations mimic nature; human eyes are essentially mounted horizontally on a planer surface. Camera images are typically rectangular, and the planer-horizontal configuration aligns well with typical coordinate systems.
The world contains many horizontal lines, particularly in indoor environments. These include moldings and horizontal edges to doors, windows and furniture. These objects are very strong features that would greatly aid in the motion of mobile robots, but are unusable by a vision system with cameras configured horizontally. Using a vertical camera orientation makes it virtually impossible to correlate features on vertical lines. This includes corners between walls, and vertical legs on furniture. Trees and other plants contain many vertical edges in outdoor environments.
These systems and methods fail to utilize the correlation of strong features such as horizontal and vertical lines to simplify the correlation of features between images in a stereovision system and are therefore limited in their ability to estimate distances.