The present invention relates to machine vision systems used to create models based on two dimensional images from multiple perspectives, multiple camera settings (camera setting include system states (e.g. vehicle speed, direction)), active and passive range sensors used for a variety of applications. In particular, machine vision systems can be provided for various machine vision applications.
Common imaging systems are made without specific requirements to provide three dimensional (3D) imaging or range finding capability. However, 3D scene and range information can be recovered from a collection of two dimensional (2D) images. Among various 3D reconstruction algorithms are Structure from motion (SFM), which requires translational movement of the camera, and Depth from Defocus (DFD) algorithms, which restricts camera movement.
Research and development was undertaken to address limitations of each of these methods by looking at how limitations and disadvantages associated with current approaches of using 2D images to create 3D models of structure within the 2D images may be mitigated which included modifications, combinations, and additions with regard to SFM and DFD approaches. In particular, efforts were undertaken to compare precision and accuracy of DFD and SFM and find an approach to create embodiments which included modifications and combinations of useful aspects of DFD and SFM while eliminating or addressing mutually exclusive limitations on their use.
Various SFM, stereo vision, and DFD algorithms have been developed for various applications. For example, one approach known as the 8 Points Algorithm, described in Hartley, R. I., Zisserman, A., “Multiple View Geometry in Computer Vision”, Cambridge University Press, 2nd edition, ISBN: 0521540518; 2004, models or describes translation and rotational movement of a camera as a linear transform function in order to determine camera location and 3D locations but it has a substantial degree of error in different failure modes (e.g., when distance between camera positions is small). Translation and rotation of a camera in use can be represented as a system of symbolic polynomials but it still uses a pre-existing point matching which generates errors as discussed herein and it also requires substantial computing power which is impracticable in various applications. Many real time 3D estimation techniques (e.g., Simultaneous Localization And Mapping (SLAM) also rely on parallel processing with a more costly or power consuming resources to provide necessary computing power. Early DFD techniques require two pictures be captured from a matching camera location and angle (pose). In addition, methods to estimate relative blur between points in two 2D images used with DFD are sensitive to image shifting and scaling, which are common occurrences in real world application image recording processes.
Existing SFM algorithms will not work well under certain degenerate condition (e.g., failure modes or conditions) such as pure rotation and a combination of image point locations and camera motions. Many imaging systems, such as rotating security cameras or a forward facing camera on a vehicle, experience various errors or failure modes when used with SFM algorithms or systems. In particular, a system that attempts to determine or estimate depth (or z coordinate associated with pairs of feature matched x, y coordinates in multiple images) based on use of SFM techniques do not work well (e.g., produces significant errors) when distances between cameras are small relative to distance between cameras and structures within camera(s) field(s) of view. Existing SFM approaches also experience errors in performing feature matching between two different images taken from two perspectives (with small distances between image capture positions) to the same structures within fields of view to find x, y coordinates for feature matched pixels. Such feature matched pixel x, y coordinates are later used to perform triangulation steps. Errors occur at least in part due to how the traditional SFM systems using such feature matching use such 2D images to derive difference(s) in two dimensional coordinates that is small so that they end up measuring mostly noise in the feature matching step. Also, traditional DFD methods assume that a camera will stay in one place and the cameras' setting change which creates sensitivity to camera motion error and difficult or complex/costly relative defocus blur estimation is needed to perform depth from defocus estimation with cameras in motion.
Real time application based on existing SFM and DFD methods require substantial computational resources and therefore have significant barriers for such use. Passive depth sensing using a single monocular image has been used in real time but still requires substantial resources and also has inherent limitations and tradeoffs including a need for a pre-trained machine learning system. Typically, existing machine learning algorithms used in relation to monocular depth estimation are trained offline using active range finder and a set of 2D images, and then the trained algorithm is used to determine depth in real time.
Accordingly, existing systems or technology have a variety of disadvantages when used for applications such as range finding, 3D mapping, machine vision etc. when using various techniques in various failure modes or conditions. Thus, improvements to the existing art were needed to address various disadvantages and enable various applications.
Improved combinations of SFM, DFD, monocular depth estimation processes, machine learning systems and apparatuses including imager systems allows a vehicle with mounted two dimensional cameras which are relatively close together to explore surrounding environment and mitigate measurement errors using multiple camera settings without a pre-trained machine learning system as well as being able to operate with movement as well as without movement.
An embodiment of the invention can include live training of the machine learning algorithm that can be performed based using output from SFM and DFD measurement and a series of 2D images that are acquired live (as a vehicle is stationary or moving in an environment) and with computer generated images and data. At the same time, the machine learning is trained in a parallel process and the newly trained machine learning algorithm can be used for depth prediction after the training is done. In addition, since the accuracy of passive range sensing is dependent to the selected camera settings, an optimal camera setting for a specific environment, weather condition and object distance or characteristics can be selected using machine learning that search for camera settings that minimize algebraic or geometric errors that is obtained in passive depth estimation calculation.
Also, generally machine learning systems also require a significant amount of learning data or even if it is pre-trained it will not be able to adapt or operate in different or anomalous environments. Thus, embodiments of the invention enable an exemplary system to rapidly adapt to new environments not found in its prior training data. Also, a number of systems use only one camera or have two cameras which are set close to each other and do not move. These systems require increased ability to perform various tasks such as range finding or 3D mapping with significant accuracy however existing systems or methods would not accommodate such needs.
By integrating and modifying DFD and SFM, camera and depth location can be recovered in some conditions where structure from motion is unstable. Existing algorithms that combines multiple depth cues from monocular camera focuses mainly on the triangulation of 3D scene from known, pre-calibrated camera positions.
In at least some embodiments of the invention, a 3D reconstruction from a near focused and a far focused 2D synthetic images are performed without prior knowledge of the camera locations. Depth cues from DFD and SFM machine instructions or logic sequences or algorithms are combined to mitigate errors and limitations of individual SFM or DFD algorithm. Several SFM and DFD approaches, including state of the art techniques are discussed or provided herein. A robust relative blur estimation algorithm and a method to integrate DFD and SFM cues from multiple images are also provided.
An exemplary point cloud SFM-DFD fusion method approach improves robustness or capacity of depth measurement in situations where multiple cameras are located in a relatively small but significant distance apart compared to depth being measured mitigating or addressing prior art difficulties in feature matching (e.g., is inconsistent). For example, embodiments of the invention are capable of operating in cases where several existing generic SFM and DFD software systems failed to generate 3D reconstructions. Exemplary improved point cloud SFM-DFD fusion methods or systems were able to generate an improved 3D point cloud reconstruction of structures in scenes captured by exemplary systems.
Passive range finder and 3D mapping applications. Generally, exemplary control systems, algorithms or machine instructions are provided to combine passive 3D depth cues from 2D images taken at different sensor locations (e.g., SFM) or settings (e.g., DFD) to construct 3D depth maps from 2D images using symbolic-numeric approach that deliver robustness against or capacity to operate with respect to various degenerate conditions that traditional techniques were unable to operate effectively within.
In particular, embodiments are provided which include an algorithm to estimate relative blur between elements of at least sets of two or sets of images taken from different location and/or sensor settings that are robust against multi-view image transformation and multi-view/stereo correspondence errors. Embodiments can include machine readable instructions or control instructions including an algorithm to measure relative blur in sets of 2D images taken from different perspectives and camera setting, an algorithm to combine active depth sensor and passive 3D depth cues from images taken at different sensor location (SFM) or settings (DFD), an algorithm to estimate 3D information from single monocular image using statistical or machine learning techniques that can be trained with live and computer generated images and video, and can provide adjustable processing speed by constraining a size of feature vectors and probability distribution function(s) based on limited computing resources, and a machine learning algorithm to find optimal camera settings for 3D depth estimation.
Additional uses. Exemplary embodiments can also be used in a similar manner as other 3D vision, range finder and blur estimation methods. Examples include potentially novel applications including applications of 3D imaging including: 3D imaging and image fusion with multiple sensors at multiple platforms and locations and to generate a synthetic aperture sensor; estimating relative blur between two images can be estimated by the algorithm, and the physical mechanism of the blurred image formation can be modeled based on the sensors and scene that produce such images. A combination of relative blur information and the blur formation model may yield information on camera setting or scene characteristics.
An exemplary algorithm was tested with a thin lens model, but different models can be used for other hardware (e.g., fish eye lens). Additional applications can include integration of exemplary algorithm with other computer vision algorithms. Estimation of relative blur of objects or scene in 2D picture as well as the 3D map can be utilized in super resolution imaging algorithm, occlusion removal, light field camera, motion blur estimation, image de-blurring, image registration, and as an initial 3D scene estimation for an iterative 3D estimation algorithm. Exemplary algorithms can be used to extract camera information from collection of 2D pictures, including camera calibration, camera pose, and sensor vibration correction. Relative blur estimation can be used for auto focus. Exemplary blur estimation algorithms provides robustness against image transformations, transformation induced errors, including hand shaking, which occurs when images are taken with a hand-held camera. Another application can include use with 3D imaging usable for triangulation, target tracking (e.g. with Kalman filter), passive range finder and fire control system. Another application can include use as a part of gesture recognition system interfaces. Another application can be used with SFM that can be used to extrapolate 2D images from an arbitrary point of view, render a 3D world from sample 2D images, produce visual effects in movies, and for applications in virtual and augmented reality application. Another application can include embodiments incorporating exemplary methods using 3D imaging microscopic objects (including electronics and biological samples) with microscopes and for applications in astronomy with telescopes. Exemplary methods can be applicable for 3D scanner, objects recognition, manufacturing, product inspection and counterfeit detection, and structural and vehicle inspections. Embodiments can also be used in relation to mitigating atmospheric changes that may also generate relative blur information that can be modeled and used for depth measurement, or if depth of an object is known, the relative blur may provide information on scintillation, weather and atmospheric conditions (e.g. fog, turbulence, etc.). Atmospheric changes may also generate relative blur data that can be modeled and used for depth measurement. Embodiments can also be used with consumer camera or medical tools (e.g. ophthalmoscope, otoscope, endoscope), where exemplary designs or methods can be applied to obtain 3D measurement and other characterization of bumps, plaque, swellings, polyps, tissue samples or other medical conditions. Exemplary methods can also be applied for vehicle navigation; sensor and vehicle pose estimation, visual odometry (e.g., measuring position of a vehicle based on visual cues) and celestial navigation. With the knowledge of camera movement, embodiments of the invention can be modified to measure movements of objects in the image.
Additional features and advantages of the present invention will become apparent to those skilled in the art upon consideration of the following detailed description of the illustrative embodiment exemplifying the best mode of carrying out the invention as presently perceived.