This invention relates to augmenting traditional multiview stereo (MVS) reconstruction methods with semantic information such as semantic priors.
Recent years have seen rapid strides in dense 3D shape recovery, with multiview stereo (MVS) systems capable of reconstructing entire monuments. Despite this progress, MVS has remained largely applicable only in favorable imaging conditions. Lack of texture leads to extended troughs in photoconsistency-based cost functions, while specularities violate inherent Lambertian assumptions. Diffuse photoconsistency is not a reliable metric with wide baselines in scenarios with few images, leading to sparse, noisy MVS outputs. Under these circumstances, MVS reconstructions often display holes or artifacts.
On the other hand, there have been developments in two seemingly disjoint areas of computer vision. With the advent of cheap commercial scanners and depth sensors, it is now possible to easily acquire 3D shapes. Concurrently, the performance of modern object detection algorithms has rapidly improved to allow inference of reliable bounding boxes in the presence of clutter, especially when information is shared across multiple views.