Deep learning is a technology used to cluster or classify objects or data. For example, computers cannot distinguish dogs and cats from photographs only. But a human can easily distinguish those two. To this end, a method called “machine learning” was devised. It is a technique to allow a computer to classify similar things among lots of data inputted thereto. When a photo of an animal similar to a dog is inputted, the computer will classify it as a dog photo.
There have already been many machine learning algorithms to classify data. For example, a decision tree, a Bayesian network, a support vector machine (SVM), an artificial neural network, etc. have been developed. The deep learning is a descendant of the artificial neural network.
Deep Convolution Neural Networks (Deep CNNs) are at the heart of the remarkable development in deep learning. CNNs have already been used in the 90's to solve the problems of character recognition, but their use has become as widespread as it is now thanks to recent research. These deep CNNs won the 2012 ImageNet image classification tournament, crushing other competitors. Then, the convolution neural network became a very useful tool in the field of the machine learning.
Image segmentation is a method of generating at least one label image by using at least one input image. As the deep learning has recently become popular, the segmentation is also performed by using the deep learning. The segmentation had been performed with methods using only an encoder, such as a method for generating the label image by one or more convolution operations. Thereafter, the segmentation has been performed with methods using an encoder-decoder configuration for extracting features of the image by the encoder and restoring them as the label image by the decoder.
FIG. 1 is a drawing schematically illustrating a process of performing the image segmentation by using a conventional neural network.
By referring to FIG. 1, according to a conventional lane detection method, a learning device receives an input image, generates at least one feature map by instructing one or more multiple convolutional layers to apply one or more multiple convolution operations and one or more non-linear operations like ReLU to the input image, and then generates a segmentation result by instructing one or more deconvolutional layers to apply one or more deconvolution operations and SoftMax operations to the feature maps.
However, there is a problem that it is difficult to clearly recognize each of classes by using only the neural network shown in FIG. 1, and especially, it is difficult to precisely recognize information on at least one edge of each of the classes.
In addition, another conventional method of instance segmentation for detecting objects such as lanes is performed through a clustering process after a process of the segmentation. However, since these two processes are disparate processes, there is a problem of poor performance if the two processes are learned together.