Object recognition may include the task of identifying objects in an image or a video sequence of images. Such object recognition techniques may have a wide range of applications. For example, human body recognition applications may include surveillance, robotics, automotive driving, and caring for the elderly. Furthermore, human body recognition may be important for computer vision applications such as pedestrian detection, human body tracking, human body identification, human pose estimation, human action recognition, image based people searching, and the like. Therefore, developing automated computer vision systems for performing object recognition in images or videos may be increasingly important.
For example, in human body recognition, current techniques may generally be divided into two categories: handcrafted feature based techniques and learned deep feature based techniques. Handcrafted feature based techniques may use manually designed features such as histograms of oriented gradients (HOG) features, a combination of HOG and local binary pattern (HOG-LBP) features, color self-similarity (CSS) features, or multi-scale HOG features and deformable part models (HOG-DPM) to describe human body appearances. Furthermore, learned deep feature based techniques may employ a deep convolutional neural network (CNN) in object recognition implementations. Results of such deep CNN implementations indicate hierarchical neural features learned from large-scale datasets may be more robust than handcrafted feature based techniques in handling complex object recognition tasks including human body recognition in challenging scenarios such as changes in pose, changes in lighting conditions, changes in viewpoint, objects with partial occlusion, and the like.
However, such deep CNN implementations may include hundreds of millions of parameters or more and complex feed-forward computations, which place a heavy burden on devices during implementation. For example, some deep CNN implementations may include 60 million floating point parameters, which cost about 232 MBs of memory space. Such intensive memory and computation requirements may make such deep CNN implementations unsuitable in many implementations, particularly in mobile device implementations.
It may be advantageous to perform object recognition with high accuracy, and with less computational and memory resource requirements. It is with respect to these and other considerations that the present improvements have been needed. Such improvements may become critical as the desire to perform object recognition becomes more widespread.