An autonomous car is a vehicle that is capable of sensing its environment and navigating without human input. Autonomous cars use a variety of techniques to detect their surroundings, such as radar, laser light, GPS, odometry, and computer vision.
Estimating three-dimensional (3D) information from two-dimensional (2D) monocular images by using the computer vision is an important task in applications such as autonomous driving and personal robotics. In general, a 2D box for bounding an object in an image is created and then a 3D model is constructed from the 2D box.
To find the 2D box for bounding an object, conventional technologies used template-based method in general. One of the conventional methods for creating the 2D box is a sliding windows method. The method slides a window-like box repeatedly over a whole image with varying scales and detects each object inside the window-like box. That is, as the objects in the image can be of very different sizes or scales, the image is scaled down several times and the window-like box is slid over the image again to find the objects at different sizes.
One of other conventional methods is an anchor box method. In this method, various anchor boxes are centered at a certain position and an anchor box with the highest probability, e.g., the highest overlapping region with a ground truth object, among the various anchor boxes, is determined by using regression analysis.
Then, a 3D bounding box is constructed from the determined anchor box, however, there are limitations. First, the 3D bounding box may have six surfaces and three surfaces among the six surfaces may require exhaustive searches. Second, in case a single template is used for determining the three surfaces of the 3D bounding box, an accuracy may be low because boundary conditions of regression may vary as 3D orientation of the object changes. Third, conventional methods of acquiring the 3D bounding box require much computational resources. For example, matching a cuboid template or voxel for finding a 3D bounding box takes up a lot of computational time.
Thus, the present disclosure proposes a new method for removing such redundant computation and improving the accuracy of detection.