Computer vision (CV) is a technical discipline that allows computers, electronic machines, and connected devices to gain high-level understanding from digital images or videos. Typical CV tasks include scene reconstruction, event detection, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, object tracking, facial recognition, object counting, 3D imaging, image enhancement and image restoration. 3D imaging is the process of capturing the shape and appearance of real objects. Digital camera devices for capturing 3D content are devices that can concurrently capture image data and depth information associated with the image data. To display 3D content, these systems then perform 3D reconstruction post capture by combining of image data and depth information.
Stereo camera systems are one subset of digital camera devices for capturing 3D content. Stereo camera systems capture image data and depth information simultaneously by capturing right and left stereo views of a scene. To perform 3D imaging, depth information contained in the stereo images is extracted post capture by mapping the disparity between the right and left stereo views. In stereo vision, large disparity between right and left stereo views is associated with near objects, while objects that are further away from the capturing device are closer to the zero disparity plane and therefore have smaller disparity values. Rendering image data with its corresponding depth information generates 3D content wherein every pixel contains the distance to a point in the imaged scene.
Other digital camera devices may leverage monocular techniques for generating 3D content. Monocular systems may capture image data from one camera module and depth information from a discrete depth sensor, for example, a time of flight sensor, dot field projector, or LIDAR system. Post capture, 3D a monocular system generates 3D content by associating image data with its corresponding depth information provided by the discrete depth sensor. Stereo camera systems may also incorporate a discrete depth sensor to improve the efficiency, accuracy, precision or depth information and/or reduce the processing power, time requirements, or power consumption needed to generate depth information. Machine learning models and artificial intelligence may also be used to provide or enhance image data, depth information, or both.
Applications of 3D imaging and computer generated depth are expanding to a wide variety of critically important fields including construction, medicine, manufacturing entertainment, research, retail, security, and transportation. These applications often require devices that are portable, cheap, and capable of performing 3D imaging and calculating depth information in real time in a variety of capture conditions with low power consumption and limited processing resources. To generate accurate depth information, current 3D capture solutions typically require emissions based methods of depth detection (e.g., LIDAR, dot filed projection, and time of flight sensors). These techniques add considerable cost, increase power consumption, and require more complex processing than stereoscopic capture methods of 3D imaging.
Thus, there is a need in the field of CV to create new and useful devices for capturing 3D images and generating depth information that primarily leverage image sensors. The embodiments of the present application provide a novel digital camera device for 3D imaging that improves upon existing systems to deliver such new and useful methods for 3D image capture and depth generation.