Deep learning is a technology used to cluster or classify objects or data. For example, computers cannot distinguish dogs and cats from photographs alone. But a human can easily distinguish those two. To this end, a method called “machine learning” was devised. It is a technique to allow a computer to classify similar things among lots of data inputted thereto. When a photo of an animal similar to a dog is inputted, the computer may classify it as a dog photo.
There have already been many machine learning algorithms to classify various data. For example, a decision tree, a Bayesian network, a support vector machine (SVM), an artificial neural network, etc. have been developed. The deep learning is a descendant of the artificial neural network.
Deep Convolution Neural Networks (Deep CNNs) are the heart of the remarkable development in deep learning. CNNs have already been used in the 90's to solve the problem of character recognition, but their use has become as widespread as it is now thanks to recent researches. These deep CNNs won the 2012 ImageNet image classification tournament, crushing other competitors. Then, the convolution neural network has become a very useful tool in the field of the machine learning.
Image segmentation is a way to receive an image as an input and produce a labeled image as an output. As deep learning technology has recently become popular, the image segmentation also adopts the deep learning. According to a conventional technology, the image segmentation was performed by (i) applying one or more convolution operations to an input image to thereby generate a feature vector and (ii) applying one or more fully connected operations to the feature vector to thereby generate a label image. According to another conventional technology, an encoder-decoder configuration was designed to extract features on an input image by using an encoder and reconstruct a label image by using a decoder.
FIG. 1 is a drawing schematically illustrating a process of performing the segmentation by using a general convolutional neural network (CNN).
Referring to FIG. 1, according to a conventional lane detection method, a learning device receives an input image and applies a plurality of convolution operations and non-linear operations such as ReLU at a plurality of convolutional layers to the input image, to thereby obtain a plurality of feature maps, and applies a plurality of deconvolution operations and SoftMax operations at a plurality of deconvolutional layers to the feature maps, to thereby obtain segmentation results.
FIGS. 2A and 2B are drawings illustrating dropout in a neural network.
The dropout is frequently used in a fully connected (FC) layer, and since all nodes of the FC layer are connected, overfitting may frequently occur, which is a problem. For example, while performing a learning process of recognizing a dog, if training images include images of white dogs, only a white dog may be recognized as a dog. This is a problem generated due to a small number of the training images or a large capacity of a network.
FIGS. 2A and 2B illustrate a configuration of a neural network without the dropout and that with the dropout. Herein, the dropout is one of methods that are introduced to solve such a problem. FIG. 2A is a drawing illustrating a configuration of a neural network having two FC layers. Referring to FIG. 2A, each of values of nodes in one FC layer is calculated by applying its corresponding weight to (i) each of element values of each of inputs or (ii) each of element values of each of nodes in its corresponding previous FC layer.
FIG. 2B illustrates a configuration of a neural network in which some nodes are dropped out. Herein, the neural network performs a learning process while one or more arbitrary nodes in one or more FC layers are dropped out. The nodes denoted as “X” in FIG. 2B are dropped out.
Likewise, not the entire weight may be set to be contributed to the learning process, but only some nodes may be set to be contributed to the learning process. The selected nodes are randomly changed for each learning process.
However, a method for improving a performance of the learning process by using one or more residual networks, without dropping out inner nodes of each layer of the neural network, is required.