Data capture of objects or an environment is used in countless circumstances, and can enable a processor to use, act on, or manipulate the captured data to create an image of the object and/or environment. For example, a camera can capture still two-dimensional (2D) images of a scene, and a video camera can capture video data (i.e., 2D data over time) for later editing and display. Additionally, multiple view stereo 2D images focused around an object can be used to simulate a 3D image of the object. Another example of data capture that can be used to simulate a 3D image involves the use of Light Detection and Ranging (LiDAR) (also known as Laser Detection and Ranging (LADAR)) sensors, which is commonly used for mapping terrain or buildings. LiDAR directs light from a light source at a surface that reflects the light back to a receiver. A LiDAR system will then calculate the distance from the light source to the surface based on the round-trip time of the light (known as “time of flight” (TOF) data). In this way, data regarding distance or depth of objects in an environment can be captured with a LiDAR system. LiDAR systems collect many thousands of data points by fanning or pulsing its beam across an environment and measuring the time-of-flight for each object that reflects the light back to a receiver in the LiDAR system. LiDAR can only determine the depth of objects that reflect light back to the receiver and cannot detect image data pertaining to color. LiDAR and other types of TOF cameras are in a class of camera referred to herein as “depth cameras.” Other examples of depth cameras include any device that can determine depth information, such as stereo cameras, structured light scanners, or other devices that emit electromagnetic (EM) radiation and capture the time-of-flight of the radiation reflected back.
Existing technology for capturing imaging data have limitations or drawbacks. For example, current means of 2D documentation media including film, digital photography, and panoramic 2D photography (i.e., cylindrical or cubic interactive panoramas) have limitations in communicating the whole of a real life experience or event when accessed by a user or viewer. 2D images may have excellent resolution and color but have no depth information associated with the images.
Simulated 3D images can be created in various ways. Typically, images are recorded using two spaced-apart lens that record a scene from slightly different angles and these images are provided to a person so that the information is presented to each eye differently such as by use of glasses with color filters or different polarization. As with 2D images, the color and resolution may be excellent and the 3D effects of these stereoscopic images may be compelling, however, such systems still lack the ability to collect depth information of the scene.
The information captured by a depth camera may be combined with the image data of a 2D camera to provide images that simulate 3D and contain depth information. For example a LiDAR image may be combined with data from an overlapping 2D image to simulate a 3D image with depth data—that is, the image data contains information of the distance of objects in the image and has the ability to calculate size and relative spacing of objects in the image.
Current 3D imaging systems, whether stereoscopic cameras or depth cameras combined with 2D cameras, use a single camera that records a single point of view (POV) at a given time to collect data representing a single static moment. Using a single camera can result in gaps in the collected 3D data. For example, a single camera capturing a scene with a tree from a single perspective cannot capture what is behind the tree. Current methods for reducing data gaps when capturing a scene with a single camera require a lengthy process of recording single camera viewpoints, and repositioning the camera multiple times. Once the scene is captured through multiple sessions of repositioning and capturing with the single camera, extensive secondary post-processing is necessary to register the multiple scans and organize the captured data. In addition to requiring large amounts of time (due to, for example, the algorithms used and the large quantities of data), such post-processing generally requires extensive user input and technical knowledge of the user.
In addition to the extensive post-processing required to organize data captured with a these 3D cameras, additional post-processing can be required for rendering the data, including digital graphic interpolation through the construction of a digital model, vectorization, and/or image model building.
Building a data cloud to represent an environment where the depth camera moves requires that the camera have a means to track its movements and location or communicate with a remote means for tracking the camera's movement. This is a common problem in mobile robotics where an environment must be mapped so that the robot can navigate its environment. Similarly, where the data is used to create an immersive environment, the camera must track its location and tag that information to the image data. This technology is generally known as simultaneous localization and mapping (SLAM).
Descriptions of certain details and implementations follow, including a description of the figures, which may depict some or all of the embodiments described below, as well as discussing other potential embodiments or implementations of the inventive concepts presented herein. An overview of embodiments of the invention is provided below, followed by a more detailed description with reference to the drawings.