Feature extraction and feature matching for short baseline stereo (2 cameras) is well studied. For example, several Speed-Up Robust Features (SURF) can be extracted for two images from two (short baseline) cameras, and the feature matches can be based on the Mahalanobis distance or Euclidean distance of the SURF descriptors in the two images.
In an application of Global Positioning System (GPS) denied navigation systems, there are two or more moving platforms. Each platform has an inertial measurement unit, which use a combination of accelerometers and gyroscopes. Also, an electro-optical (EO) sensor such as a camera is mounted on each moving platform. The two cameras can share the same field of view, that is, the cameras on two of the vehicles can observe common landmarks. The communication between the two or more moving platforms can enable the collaborative navigation and improve the wide baseline feature matching.
For two cameras, the baseline is the line joining the camera centers. For a short baseline of two moving cameras, a geometric constraint can be used. Some geometric constraints can exploit the epipolar geometry. Some geometric constraints can use the inertial measurement to represent the moving camera, if the inertial sensor and the camera sensor are mounted together.
In a wide baseline situation, where two cameras are widely separated, existing feature matching approaches are not appropriate because they are not robust to perspective distortions, and increased occluded areas. Thus, various technical challenges exist in wide baseline feature matching that need to be addressed.
One existing approach for wide baseline feature matching uses pixel differencing (or correlation) on a small window instead of a large window, and thereafter uses graph-cuts and partial differential equations to emphasize spatial consistency. However, when the image quality or resolution is not good enough, and illumination changes, this method will fail. Another existing wide baseline feature matching approach uses small feature sets, and triangulation for feature matching, and after that uses local rectifying of the image features. This approach is highly dependent on the small feature sets and the triangulation, which can be problematic.