Environment maps and map data is pivotal for robotics, augmented and virtual reality applications. The next generation of robots, such as self-driving cars, are likely to be reliant on data extracted from environment maps and would therefore operate more robustly by having accurately annotated or described map features.
Precision of the maps' metric and semantic components play a major role in ensuring robots operate safely and efficiently in its environments, with improved perception. Semantic components of maps typically contain static objects such as road signs, traffic lights, road markings, etc., which are currently labelled manually. Although this may be possible in suburban and rural environments, it becomes extremely time and cost intensive at a city-scale where manual labelling is practically impossible due to the ever-changing landscape.
Accurately localising and differentiating objects in maps has been problematic for many methods and systems devised to visually match similar objects together. Such systems lack capability in differentiating objects which inherently look similar (e.g., traffic lights), and the ability to comprehend factors such as lighting, time-of-day, weather conditions, etc. For this reason, machine learning techniques have become the dominant approach for detecting static 3D objects in an environment.
A basic component of vision-based systems is to establish an accurate 2D detection of a static 3D object in a single image or video. This is commonly achieved using triangulation techniques. For example, if the same object is detected from two images captured by a stereo camera, it is possible to determine the 3D position of the object by using triangulation calculations. Additionally, this method can be expanded by using multiple cameras to observe/monitor the same object. Advantageously, this can improve the triangulation calculations and the resulting estimated position.
However, a common problem underlying these triangulation approaches is the need to accurately localise a set of sensors, or cameras, in a certain area. In order to address this problem, GPS systems are often used to provide highly precise location information for the sensor(s). However, in dense urban environments, GPS systems are faced with limited levels of accuracy due to limited direct visibility of the sky.
It is therefore desired that a method and system is provided for overcoming the aforementioned problems.