Highly automated driving (HAD) has become more and more important in the automotive industry. HAD applications use various sensors (e.g., cameras, Lidar and Radar systems) to perceive the environment of the vehicle. Based on the information provided by these sensors, all kinds of dynamic road users (e.g., vehicles, pedestrians and bicycles) as well as static objects such as signs, road markings can be detected.
Although many current ADAS (advanced driver assistance) applications are based on traditional techniques mainly using computer vision algorithms, other machine learning techniques, especially neural networks and variants of neural networks such as CNNs (convolutional neural networks) or RCNNs (region convolutional neural networks) are increasingly employed.
In particular, RCNNs processing camera information are regarded as state-of-the-art systems for detecting, classifying and localizing dynamic and static road objects. The quality of the detection, classification and localization of objects heavily depends on many different factors, such as the underlying neural network structure or the training data used for training the parameters of the neural network. The training is a very time-consuming process which can take place offline on servers and which requires labeled training data. Labeled training data consists of both the sensor data (e.g. camera image) and classification and localization information, (e.g., bounding boxes around vehicles or pedestrians). After the training is completed, the neural network consisting of code and configuration data is then deployed to the HAD unit in the vehicle. The neural network in the vehicle then allow the online detection, classification and localization of static and dynamic road users from camera image streams in real time. Such a process is depicted in FIG. 1.