Deep learning is a technology used to cluster or classify objects or data. For example, computers cannot distinguish dogs and cats from photographs only. But a human can easily distinguish those two. To this end, a method called “machine learning” was devised. It is a technique to allow a computer to classify similar things among lots of data inputted thereto. When a photo of an animal similar to a dog is inputted, the computer will classify it as a dog photo.
There have already been many machine learning algorithms to classify data. For example, a decision tree, a Bayesian network, a support vector machine (SVM), an artificial neural network, etc. have been developed. The deep learning is a descendant of the artificial neural network.
Deep Convolution Neural Networks (Deep CNNs) are at the heart of the remarkable development in deep learning. CNNs have already been used in the 90's to solve the problems of character recognition, but their use has become as widespread as it is now thanks to recent research. These deep CNNs won the 2012 ImageNet image classification tournament, crushing other competitors. Then, the convolution neural network became a very useful tool in the field of the machine learning.
Image segmentation is a method of generating at least one label image by using at least one input image. As the deep learning has recently become popular, the segmentation is also performed by using the deep learning. The segmentation had been performed with methods using only an encoder, such as a method for generating the label image by one or more convolution operations. Thereafter, the segmentation has been performed with methods using an encoder-decoder configuration for extracting features of the image by the encoder and restoring them as the label image by the decoder.
FIG. 1 is a drawing schematically illustrating a process of a conventional segmentation by using a CNN.
By referring to FIG. 1, according to a conventional lane detection method, a learning device receives an input image, instructs one or more multiple convolutional layers to generate at least one feature map by applying the convolution operations and one or more non-linear operations, e.g., ReLU, to the input image, and then generates a segmentation result by instructing one or more deconvolutional layers to apply one or more deconvolution operations and SoftMax operations to the feature maps.
However, there is a problem that many of edge parts are missed in the process of encoding and decoding the input image as illustrated in FIG. 1. Recently, a network called U-Net was developed which uses each information, outputted from each convolutional layer of the encoder, during the process of decoding. But there are still problems that a learning for detecting the edge parts is not performed effectively and that much energy is needed to reconstruct the edge parts.