Deep learning is a technology used to cluster or classify objects or data. For example, computers cannot distinguish dogs and cats from photographs alone. But a human can easily distinguish those two. To this end, a method called “machine learning” was devised. It is a technique to allow a computer to classify similar things among lots of data inputted into the computer. When a photo of an animal similar to a dog is inputted, the computer may classify it as a dog.
There have already been many machine learning algorithms to classify data. For example, a decision tree, a Bayesian network, a support vector machine (SVM), an artificial neural network, etc. have been developed. The deep learning is a descendant of the artificial neural network.
Deep Convolution Neural Networks (Deep CNNs) are the heart of the remarkable development in deep learning. CNNs have already been used in the 90's to solve the problem of character recognition, but their use has become as widespread as it is now thanks to recent research. These deep CNNs won the 2012 ImageNet image classification tournament, crushing other competitors. Then, the convolution neural network became a very useful tool in the field of the machine learning.
FIG. 1 is a simplified drawing of a general CNN segmentation process.
Referring to FIG. 1, according to a conventional lane detection method, a learning device receives an input image, acquires encoded feature maps by multiple convolution operations and non-linear operations like ReLU in multiple convolutional layers, and acquires a segmentation result by performing multiple deconvolution operations in multiple deconvolutional layers and SoftMax operations on a last one of decoded feature maps.
FIGS. 2A and 2B illustrate various configurations of convolutional layers for encoding images by using conventional inception methods, respectively.
In the conventional inception method illustrated in FIG. 2A, convolution operations have been applied to an input feature map transmitted from the previous layer through convolution units having various kernel sizes, for example, 1×1, 3×3, and 5×5 or a combination thereof, and then various convoluted feature maps are concatenated. In this method, an intermediate feature map, i.e., an inception feature map, considering various receptive fields from one scale was obtained.
The conventional inception method shown in FIG. 2B adds a process of reducing the number of channels by using a 1×1 convolution filter in order to reduce the amount of computation.
The conventional image encoding methods using the above concept of inception can consider various receptive fields with various kernel sizes at the time of applying the convolution operations to the feature map, but there is a problem in that only the kernel sizes of 1×1 or more can be considered. Thus, it is not possible to consider various features of the image. As such, it is required to propose a new method for extracting features with more diverse characteristics considering more various kernel sizes.