In advent of autonomous unmanned aerial vehicle (UAV) such as multirotors has had a strong impact in the robotics field. They have opened new avenues of research and are rapidly being deployed in search and exploration situations which were not accessible through UGVs. Tasks such as constructing 3D structures have now been accomplished through autonomous multirotors. There are lot of commercially available multirotors available in the market, which provide a video feedback using monocular camera. Researchers can easily use structure from motion concepts, to generate a reliable dense structure from these videos. But, the metric scale is absent within this estimation procedure. However, autonomous control requires accurate and frequent estimate of system states such as position and orientation. A number of research works have utilized external motion capture systems, but scaling them outside lab conditions is out of scope. State estimates can be obtained through GPS, but the frequency is low and the error covariance is high. Also, they often fail in an indoor environment due to signal interference.
Another widely used method is robotic vision-based framework wherein an on-board camera device such as a monocular camera or stereo camera is used for state estimation through algorithms such as SLAM, VO. Stereo camera provide accurate state estimates up to scale but incurs a weight penalty. On the other hand, monocular cameras, while being a low cost and low weight solution, are scale agnostic, i.e., they are scaled by an arbitrary factor. With reference to FIG. 1, in order to provide scale information, researchers have fused IMU or altitude sensor data, in metric scale, with the monocular vision sensor. The method for scale estimation in case of a monocular camera is using a metric sensor such as an IMU, Ultrasound beside others. Unlike conventional scale estimation methods that require input from one or more sensors such as an IMU or altitude sensor.