Deep learning is a technology used to cluster or classify objects or data. For example, computers cannot distinguish dogs and cats from photographs alone. But a human can easily distinguish those two. To this end, a method called “machine learning” was devised. It is a technique to allow a computer to classify similar things among lots of data inputted into the computer. When a photo of an animal similar to a dog is inputted, the computer will classify it as a dog photo.
There have already been many machine learning algorithms to classify data. For example, a decision tree, a Bayesian network, a support vector machine (SVM), an artificial neural network, etc. have been developed. The deep learning is a descendant of the artificial neural network.
Deep Convolution Neural Networks (Deep CNNs) are the heart of the remarkable development in deep learning. CNNs have already been used in the 90's to solve the problem of character recognition, but their use has become as widespread as it is now thanks to recent research. These deep CNNs, won the 2012 ImageNet image classification tournament, crushing other competitors. Then, the convolution neural network became a very useful tool in the field of the machine learning.
FIG. 1 shows an example of various outputs to be acquired from a photograph using a deep CNN according to prior art.
Classification is a method for identifying a type of a class to be acquired from a photograph, for example, as shown in FIG. 1, determining whether an acquired object is a person, a lamb, or a dog. Detection is a method for finding every object and displaying the found object as enclosed in a bounding box. Segmentation is a method for distinguishing a region of a specific object from other objects in a photograph. As the deep learning has recently become popular, the classification, the detection, and the segmentation are using the deep learning heavily.
FIG. 2 is a simplified drawing of a conventional lane detection method using a CNN, and FIG. 3 is a simplified drawing of a general CNN segmentation process.
First of all, by referring to FIG. 3, according to the conventional lane detection method, a learning device receives an input image, acquires feature maps by multiple convolution operations and non-linear operations like ReLU in multiple convolutional layers, and acquires a segmentation result by performing multiple deconvolution operations in multiple deconvolutional layers and SoftMax operations on a last of the feature maps.
Also, by referring to FIG. 2, the segmentation result of the conventional lane detection method is composed of two elements, i.e., lanes and backgrounds, as shown in the middle of FIG. 2. The segmentation result is expressed as probability estimation. The lanes are found by sampling pixels with high probabilities of being on any lane from candidate pixels selected as such, and then the lanes are finally determined by using a lane modeling function acquired from pixels on the found lanes.
However, a conventional CNN device for detecting one or more specific objects such as lanes should classify various background parts as one class (i.e., a class in which label=0), but it is difficult to accurately detect the specific object because of a large in-class variation (the variation of detected values within the same class) in the background parts. For example, when detecting one or more lanes, the background parts other than the lanes in an input image include various shapes such as a sign, a building and the like. However, since there are some background parts having a shape similar to that of a lane, the label value in the background class is not close to 0 and the variation of the label value in the background class becomes large. Namely, if an object belonging to a background part has a shape similar to that of a lane, the label value becomes close to neither the lane (label=1) nor the background (label=0), which makes lane detection difficult.