Most recent advances in the area of the 3d scene reconstruction are related to the structure from motion (SfM) where 3d information is extracted from a plurality of the images of the scene taken from different locations at different times, such as from video sequence. Another popular method use laser-based light detection and ranging (LIDAR) for distance measurements and the range data is combined with conventional high resolution 2d images during post-processing. Both methods are not suitable (or are of limited use) for dynamic scenes such as required for the autonomous vehicle navigation as different parts of the scene are sampled at different times. Being active (laser light is emitted by the system) LIDAR method additionally consumes more power and reveals the location of the scanning device that may be unwanted.
What remains is probably the oldest method of ranging that evolved in living organisms including humans—narrow baseline multi-view stereo (MVS), usually binocular. This approach was used in military rangefinders for a long time, there are many binocular cameras available. The main challenge for such visual ranging is the low distance precision when the distance to the object is hundreds and thousands times larger, than the baseline. “Accurate range estimates for objects that are a kilometer away, however, requires a 10 m baseline” [Fields, John Richard, James Russell Bergen, and Garbis Salgian. “System and method for providing mobile range sensing.” U.S. Pat. No. 8,428,344. 23 Apr. 2013]—such long baselines rangefinders were available on WWII warships, but are not practical for the autonomous vehicle navigation. Another challenge is that image matching may be difficult for poorly textured objects of low contrast, linear features parallel to the baseline are also virtually undetectable.
On the other hand matching of the long-range/short baseline images is simpler than those acquired with the long baseline MVS as there is less perspective distortions between the individual images and ranged objects maybe considered fronto-parallel in most cases. Matching of almost identical images may be efficiently handled by the fixed-window correlation such as phase-only correlation (PoC) [Nagashima, Sei, et al. “A subpixel image matching technique using phase-only correlation.” Intelligent Signal Processing and Communications, 2006. ISPACS'06. International Symposium on. IEEE, 2006.]. PoC may be efficiently implemented in the frequency domain and is used for video-compression for motion vector estimation [Kresch, Renato, and Neri Merhay. “Method and apparatus for performing motion estimation in the DCT domain.” U.S. Pat. No. 6,611,560. 26 Aug. 2003.]
Additional challenge for the MVS systems with subpixel accuracy that use modern high-resolution sensors is handling of the lens optical distortions and aberrations that reduce image resolution, especially in the off-center areas. Effect of the optical aberrations may be reduced by the space-variant image deconvolution [Harmeling, Stefan, et al. “Method and device for recovering a digital image from a sequence of observed digital images.” U.S. patent application Ser. No. 13/825,365.].
Image distortions along do not reduce resolution, but they usually require image rectification to the common rectilinear projection before matching. The process of rectification involves pixel re-sampling, and that either reduce image quality by adding re-sampling noise or require up-sampling and increased computational resources and memory requirements.
It is therefore an object of the invention to improve resolution of long-range MVS, increase robustness of the method for the poorly textured objects, provide means for optical aberrations correction and avoid image re-sampling while finding solution that can be efficiently implemented in FPGA for real-time applications.