The development of autonomous vehicles has been an important research and development project in recent years, and a detecting or sensing apparatus has been especially important. A detecting apparatus could be improved by providing more reliable sensing data and by providing more precise information regarding the surroundings of a vehicle body within a specific type of environment. The detecting apparatus could also create enhanced information based on sensor readings of the surroundings of a vehicle body. In the construction of the detecting apparatus, object detection is one of the indispensable technologies. By identifying the location and type of objects that appear in front of the vehicle based on images taken by a high-resolution camera mounted on the body of the vehicle, and also by combining technologies such as computer vision and deep learning, a vehicle can make an accurate decision to determine whether to dodge an object or to apply the brakes. The decision making could be similar to an artificial intelligence which adopts a strategy based on observation through its eyes.
However, various object detections may rely on deep learning methods. Deep learning is a general term for using training data in order to modify a learning model. Deep learning may require a large amount of computing resources to train and approximate the learning model. When the detection apparatus performs object detection, the trained model would be used for forward propagation calculation. The computational amount could be substantial during both the training phase and the prediction phase. Without a hardware device with high computing power, such endeavor could be nearly impossible as the number of image processing frames per second is large within a very small interval. Therefore, continuous optimizations in the algorithm level for object detection would still be necessary at this point in time.
The object detection algorithm could help the autonomous vehicle to sense any object within the sensing range while a person is driving, and the algorithm would also provide other systems with early path planning. To meet this demand, an excellent detecting apparatus has to satisfy at least three important characteristics including high volume (i.e. numbers of identifiable objects), accuracy (i.e. correctly identifies the type of object and the location of object), and fast (i.e. the reaction needed to reach an instantaneous computation rate). In order to satisfy the above characteristics, it is necessary to make improvements and modifications to the existing deep learning models.
Table 1 shows a comparison of characteristics among three kinds object detection model of the existing deep learning model.
TABLE 1TraditionalSingle StepDouble StepsROIDL ObjectDL ObjectConstraintsProposalDetectionDetectiondetection accuracylowmildhighfalse positive ratehighmildlowcomputational costlowmildhigh(inference)training processnoaveragelarge
Table 1 shows a predicament that the object detections must compromise to detect performance and computational complexity as higher performance in object detection would constitute a higher computational complexity. Herein, the Double Steps DL Object Detection model has the highest detection accuracy, but it typically requires the largest computational cost. In detail, the double steps DL object detection adopting similar convolution layers as in the single step DL object detection, with the difference is that the double steps DL employs a region proposal network (RPN) after those convolution layers to propose region(s) of interest (ROI) from the provided feature maps. FIG. 1 illustrates the proposed regions of interest from the RPN based on the extracted feature map from the last convolution layers. In further details, the processor would process a set of the provided feature maps (i.e. input frame illustrated in FIG. 1) by using the RPN to propose some ROI in the Double Steps DL Object Detection model in which the feature map includes a plurality of unused features (i.e. unused features UNU). The plurality of unused features UNU would require certain amount of computational cost, which is dispense ineffectively as the plurality of unused features UNU do not contribute for any detection result.
In other words, the RPN in the Double Steps DL Object Detection model has two drawbacks that reduces the efficiency of the detection framework. Firstly, as the RPN analyses the provided feature maps for any potential candidate for ROI proposal; there could be a plurality of unused features UNU that would not contribute to any ROI proposal yet these unused features UNU demand certain amount of computational cost to be computed by RPN. The first feature is unnecessary calculating parts (i.e. the plurality of unused features) which result in a computational waste caused by operations in the region where the ROI does not occur. Secondly, although the current location of RPN in the Double Steps DL Object Detection enables robust detection performance, it may constitute inefficient learning and inference. Instead, an ROI proposal could be positioned in front of the convolution layers to significantly reduce the network size as well as its computational effort.