The advent of high-performance processors has led to systems that can perform 3D reconstruction of images. Generally, 3D reconstruction from multiple images refers to the creation of 3D models from a set of images, and is the reverse process of obtaining two-dimensional (“2D”) images from 3D scenes. Some 3D reconstruction systems incorporate texture mapping which refers to a method for defining high frequency detail surface texture or color information on a computer generated graphic or 3D model. A textured model is a computer generated graphic or 3D model obtained by texture mapping. Some 3D reconstruction systems provide a level-of-detail texture model. Level-of-detail refers to decreasing the complexity of a 3D model representation as it moves away from the viewer or according to other metrics such as object importance, viewpoint-relative speed, or position.
Some 3D reconstruction systems implement structure from motion (“SfM”) techniques. SfM refers to recreating a 3D high-resolution model (i.e., a “reconstruction”) from nothing but a stream of images (e.g., still captures or sampled video frames) and knowledge of the intrinsic parameters of a camera. Intrinsic parameters of a camera describe the math that models how a camera lens bends light. Generally, extrinsic camera calibration is based on where the camera is when it acquired the image, and its orientation, relative to some frame of reference. On the other hand, an intrinsic calibration provides a set of coefficients that serve as a mathematical description of the lens, whether fisheye or perspective, or another model. Intrinsic calibration accounts for both linear and non-linear components of a camera's optics. With reference to the non-linear component, many cameras stretch images towards the corners. This effect can be corrected by modeling the lens distortion of that camera, such as using the Brown model for radial and tangential distortion. The linear component (in a pinhole camera) pertains to, at least, the focal length and principle point, and can include per-axis focal lengths as well as skew. This is a linear system that describes how to convert a Euclidean ray that originates at the optical center of the camera into a 2D pixel coordinate. These rays are used to extract camera positions and the shape of objects in view from pixels in the images that are found as keypoints during feature detection and matching. This is done by taking advantage of the geometrical relationships between cameras and the points that they observe. A keypoint is the location in the image of an interesting feature (sometimes called an interest point or a “corner”).
SfM provides a photogrammetric range imaging technique for estimating 3D structures from 2D image sequences that may be coupled with local motion signals. In biological vision, SfM refers to the phenomenon by which humans (and other living creatures) can recover 3D structure from the projected 2D (retinal) motion field of a moving object or scene.
An example is creating a 3D point cloud, a textured mesh of that cloud, and recreating the relative (or absolute) position of the sensor for each source image. A point cloud refers to a set of world points. A world point is a Cartesian representation of a location in 3D space (e.g., (x,y,z)) of a triangulated feature. The term “world” refers to the frame of reference context for the point, in that the point is situated in, and defined with respect to, some Cartesian coordinate system, and this basis frame of reference defines a new relative “world” in which to spatially relate points. That is, the world point represents a single point in a reconstructed structure in 3D space, and is representable as a 3D vector. Triangulation refers to the process of using two or more camera poses with corresponding matched features across those poses to determine the location of a world point relative to a frame of reference of the camera poses. Generally, the combination of position and orientation of an object relative to some coordinate system is referred to as a pose of the object, even though this concept is sometimes used only to describe the orientation. Relative pose refers to relative camera motion between two calibrated views. A mesh refers to the 3D shape of a model. Texture represents a sheet lying on the surface of the model. Further details may be painted on the texture. A textured mesh is a mesh with a texture added thereon.
Generally, an image processing system may include a graphics processing unit (“GPU”). A GPU, occasionally called a visual processing unit (“VPU”), is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing, and their highly parallel architecture makes them more efficient than general purpose central processing units (“CPUs”) for algorithms where the processing of large blocks of data is done in parallel. In a personal computer, a GPU can be present on a video card, or it can be embedded on the motherboard or, in certain CPUs, on the CPU die. A GPU global memory is a local memory storage on a GPU device that is accessible to all threads that execute on that GPU. In one example, in GPUs provided by NVIDIA Corp. of Santa Clara, Calif., the global memory is accessible to CUDA® kernels executing in parallel. CUDA® is a parallel computing platform and application programming interface (“API”) model created by NVIDIA Corp. of Santa Clara, Calif.
GPUs may be a “programmable logic chip” and may be implemented with programmable logic controllers (“PLCs”). However, GPUs such as those provided by NVIDIA are not PLCs, but rather a sort of specialized general purpose computer. There is no change in the circuit mappings in these GPUs, and they are programmed generically, such as with central graphics (“Cg”), open graphics library shading language (“GLSL”), CUDA®, open computing language (“OpenCL”), and other languages, even though they are designed for certain special tasks. Generally, CUDA® is one tool that can be used for GPU acceleration. However, the same functionality may be provided, for example, with OpenCL on graphics cards from both NVIDIA and Advanced Micro Devices (“AMD”) Inc. of Sunnyvale, Calif.