The present invention relates to systems for processing structure from motion (SFM).
Vision-based structure from motion (SFM) is rapidly gaining importance for autonomous driving applications. Monocular SFM is attractive due to lower cost and calibration requirements. However, unlike stereo, the lack of a fixed baseline leads to scale drift, which is the main bottleneck that prevents monocular systems from attaining accuracy comparable to stereo. Robust monocular SFM that effectively counters scale drift in real-world road environments has significant benefits for mass-produced autonomous driving systems.
A popular way to tackle scale drift is to estimate height of the camera above the ground plane. Prior monocular SFM works like use sparse feature matching for ground plane estimation. However, in autonomous driving, the ground plane corresponds to a rapidly moving, low-textured road surface, which renders sole reliance on such feature matches impractical. Also, conventional monocular SFM systems correct for scale by estimating ground plane from a single cue (sparse feature matching). Prior cue combination frameworks do not adapt the weights according to per-frame visual data. Prior localization systems use a fixed ground plane, rather than adapting it to per-frame visual estimates.