Computer vision (CV) is a technical discipline that allows computers, electronic machines, and connected devices to gain high-level understanding from digital images or videos. Typical CV tasks include scene reconstruction, event detection, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, and image restoration. Scene reconstruction or 3D reconstruction is the process of capturing the shape and appearance of real objects. 3D cameras are devices that can perform 3D reconstruction using, for example, monocular cues or binocular stereo vision. 3D cameras process image information from one or more camera modules to generate realistic scenes that provide the appearance of depth when rendered in a 3D displays.
Scenes captured by 3D cameras can be used to produce virtual reality (VR) content (i.e. content that replicates a different sensory experience, e.g., sight, touch, hearing or smell in a way that allows a user to interact with the simulated environment). In particular, some virtual reality technologies focus on visual experience. The visual experience is displayed on a computer screen or with a virtual reality headset (also referred to as head mounted display or HMD). The virtual reality technology simulates the immersive environment in a way close to the real world experience in order to replicate a lifelike experience.
Successful application of CV techniques requires precise and accurate calibration of camera modules capturing image data processed using CV methods. 3D cameras, stereo camera systems, and other 3D reconstruction devices, especially devices including multiple camera modules, are especially difficult to calibrate because even small manufacturing variations or slight shifts in the position of one or more camera components (e.g., lenses or image sensors) can significantly impact calibration parameters required for accurate calibration. 3D camera calibration of devices including multiple cameras involves computing intrinsic parameters for each camera independently and then computing the relative extrinsic parameters between the two intrinsically calibrated cameras. Rectification matrices derived from the intrinsic and extrinsic parameters are used to rectify the right and left images. Subsequent processing steps may then be performed on rectified images to accurately sense depth, track objects, enhance images, reconstruct scenes, and perform other CV tasks.