Since the work by Krizhevsky and others on ImageNet large scale visual recognition challenge (ILSVRC) in 2012, additional research on convolutional neural networks (CNN) have been occurring. For example, research has been looking into transferring pre-trained CNN models on a large-scale dataset like ImageNet to other visual recognition tasks with limited new training data. The research appears focused on taking middle-layer activations of the pre-trained CNN models as rich feature representations to apply to various applications such as object detection, object recognition, image retrieval, etc. To achieve advanced and robust performance, people either fine-tune the pre-trained CNN models on their own tasks or make extensively data augmentation to get robust classifiers. These developed techniques have shown promising results in comparison to conventional methods using standard feature representations such as bag-of-words, sparse-coding, etc. However, the neural codes from the middle-layer have less semantic meaning, which could lead to the well-known semantic gap. In addition, such approaches may encounter the curse of dimensionality problem when employing pyramid or grid extension to middle-layer neural codes.