Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to the prior art by inclusion in this section.
Deep Convolutional Neural Networks (CNNs) currently produce state-of-the-art accuracy on many machine learning tasks including image classification. Early Deep Learning (DL) architectures used only convolution, fully connected, and/or pooling operations but still provided large improvements over classical vision approaches. Recent advances in the field have improved performance further by using several new and more complex building blocks that involve operations such as branching and skip connections. Finding the best deep model requires a combination of finding both the right architecture and the correct set of parameters appropriate for that architecture.
Since the set of operations to be used for each branch remains an active area of research, finding the correct building block involves searching over the possible configurations of branch components. This increase in the search space effectively means that, in addition to traditional deep CNN hyperparameters, such as layer size and the number of filters, training a model now includes searching over the various combinations involved in constructing an effective network. This increased complexity corresponds to increased training time and often means that the process of finding the right architecture or configuration remains the result of extensive search. In addition, this complexity also presents problems with generalization since larger networks are more easily overfit to the data. There has been some research in tackling these issues by automating the architecture discovery process. Techniques such as reinforcement learning or evolutionary algorithms are generally used to search through the architecture space. However, these search techniques are computationally expensive.