Visual information is an indispensable clue for the successful operation of an autonomous land vehicle. Even with the use of sophisticated inertial navigation systems, the accumulation of position error requires periodic corrections. Operation in unknown environments or mission tasks involving search, rescue, or manipulation critically depend upon visual feedback.
Assessment of scene dynamics becomes vital when moving objects may be encountered, e.g., when the autonomous land vehicle follows a convoy, approaches other vehicles, or has to detect moving threats. For the given case of a moving camera, such as one mounted on the autonomous land vehicle, image motion can supply important information about the spatial layout of the environment ("motion stereo") and the actual movements of the land vehicle.
Previous work in motion analysis has mainly concentrated on numerical approaches for the recovery of three-dimensional (3-D) motion and scene structure from two-dimensional (2-D) image sequences. The most common approach is to estimate 3-D structure and motion in one computational step by solving a system of linear or non-linear equations. This technique is characterized by several severe limitations. First, it is known for its notorious noise-sensitivity. To overcome this problem, some researchers have extended this technique to cover multiple frames. Secondly, it is designed to analyze the relative motion and 3-D structure of a single rigid object. To estimate the egomotion of an autonomous land vehicle (ALV), having the imaging device or camera, and the accompanying scene structure, the environment would have to be treated as a large rigid object. However, rigidness of the environment cannot be guaranteed due to the possible presence of moving objects in the scene. The consequence of accidentally including a moving 3-D point into the system of equations, representing the imaged environment, in the best case, would be a solution (in terms of motion and structure) exhibiting a large residual error, indicating some non-rigid behavior. The point in motion, however, could not be immediately identified from this solution alone. In the worst case (for some forms of motion), the system may converge towards a rigid solution (with small error) in spite of the actual movement in the point set. This again shows another (third) limitation: there is no suitable means of expressing the ambiguity and uncertainty inherent to dynamic scene analysis. The invention, that solves the aforementioned problems, is novel in two important aspects. The scene structure is not treated as a mere by-product of the motion computation but as a valuable means to overcome some of the ambiguities of dynamic scene analysis. The key idea is to use the description of the scene's 3-D structure as a link between motion analysis and other processes that deal with spatial perception, such as shape-from-occlusion, stereo, spatial reasoning, etc. A 3-D intepretation of a moving scene can only be correct if it is acceptable by all the processes involved.
Secondly, numeral techniques are largely replaced by a qualitative strategy of reasoning and modeling. Basically, instead of having a system of equations approaching a single rigid (but possibly incorrect) numerical solution, multiple qualitative interpretations of the scene are maintained. All the presently existing interpretations are kept consistent with the observations made in the past. The main advantage of this approach of the present invention is that a new interpretation can be supplied immediately when the currently favored interpretation turns out to be unplausible.
The problem of determining the motion parameters of a moving camera relative to its environment from a sequence of images is important for applications for computer vision in mobile robots. Short-term control, such as steering and braking, navigation, and obstacle detection/avoidance are all tasks that can effectively utilize this information.