Visual information plays a key role in mobile robot operation. Even with the use of sophisticated inertial navigation systems, the accumulation of position errors requires periodic corrections. Operation in unknown environments or mission tasks involving search, rescue, or manipulation, critically depends upon visual feedback. Motion understanding becomes vital as soon as moving objects are encountered in some form, e.g., while following a convoy, approaching other vehicles, or detecting moving threats. In the given case of a moving camera, image motion can supply important information about the spatial layout of the environment and the actual movements of the autonomous mobile robot or platform with the camera.
For intelligent action in the presence of potential threats and targets, or navigation in a traffic environment, information on actual motion in the scene is indispensable. Moving objects must be detected and isolated from the stationary environment, their current motions must be estimated to track them, and expectations about their future behavior must be created. Since the camera itself is moving, the stationary part of the scene cannot be assumed to be registered in subsequent images, as in the case of a stationary sensor. Simple frame-differencing or feature-matching techniques to detect and isolate moving objects do not work in this case because image changes, due to sensor motion, tend to generate too many false alarms in cluttered scenes. More sophisticated image-based techniques, which apply 2-D transformations (warping) to the image to compensate for background motion, work well only when objects are moving in front of a relatively flat background, such as in some air-to-ground applications. To detect actual object motion in the complex scenario of a robotic vehicle, the 3-D structure of the observed environment, together with the vehicle's motion, must be taken into account.
Previous work in motion understanding has focused mainly upon numerical approaches for the reconstruction of 3-D motion and scene structure from 2-D image sequences. In the classic numerical approach, structure and motion of a rigid object are computed simultaneously from successive perspective views by solving systems of linear or nonlinear equations. This technique is reported to be noise sensitive even when more than two frames are used. Non-rigid motion, or the presence of several moving objects in the field of view, tends to cause a relatively large residual error in the solution of the system of equations. Moreover, in some cases of non-rigid motion, an acceptable numerical solution may exist that corresponds to a rigid motion interpretation. In such situations, the movements of individual entities in the field of view are not detectable by the classic scheme. This approach has been generalized to handle multiple moving objects by using a complex grouping process to segment the optical flow field.
For situations having mainly translational camera movements, such as robotic land vehicles, alternative systems have been developed to utilize this particular form of self-motion. To reconstruct the 3-D scene structure, some researchers have assumed planar motion or even pure camera translation. Usually, unlike the present invention, a completely static environment is assumed.