Visual search technologies have gained an expeditious development in the recent twenty years; particularly with popularity of the Internet, various kinds of applications built around visual search have changed people's ways of life to a great extent, e.g., search engine-based image retrieval, object identification in a monitoring application, and etc. Generally, a primary goal for visual search is to search a representative set of feature points in an image and then perform relevant matching work based on the set of feature points.
MPEG-7 CDVS (Compact Descriptors for Visual Search) disclosed in Literature [1] (Duan L Y, Chandrasekhar V, Chen J, et al. Overview of the MPEG-CDVS Standard[J]. IEEE Transactions on Image Processing, 2015, 25(1):323-332) is a visual search standard proposed by the ISO/IEC MPEG in July 2014. The standard is to define the formats of CDVS, descriptor extraction process, and visual search framework so as to enable an interoperability between visual search applications which comply with the standard.
SIFT (Scale Invariant Feature Transform), a typical algorithm disclosed in Literature [2] (Lowe D G. Distinctive Image Features from Scale-Invariant Keypoints Pt International Journal of Computer Vision, 2004, 60(2):91-110), needs to use all feature points of an image for image matching. This approach would give rise to two extra overheads in practical applications, i.e., time delay in matching against a considerable number of target pictures and time delay in transmission over a low-bandwidth network. The two kinds of time delays account for most of the time consumed by the overall image matching process. CDVS is an improvement for the two drawbacks. According to Duan, during an image matching process, only a subset from all feature points is used, and the compact descriptor was constructed by the subset, so as to promote matching efficiency and meanwhile reduce transmission load.
Although the MPEG-7 CDVS may reduce the data amount for network transmission and promote the performance of pairwise image matching, an algorithm process utilized by the standard during the extraction process has a high computational complexity, which behaves rather time-consuming in practical applications. For example, on the Intel® Core™ i7-2600K 3.4 GHz processor, only about 4 frames of standard VGA size (640×480p) images can be processed per second. The process is even slower on a mobile device, e.g., on a Samsung Galaxy S3 mobile phone, only about 1 frame of image of the same size can be processed per second. Such a slow extraction speed limits, to a large extent, the influence of the standard; therefore, accelerating the CDVS extraction process becomes particularly urgent and important for applying and promoting the standard.
Currently, some works are directed to algorithms of accelerating the CDVS extraction process, while some are directed to technical applications thereof; but generally, such works are rare. A BFLoG (Block based Frequency Domain Laplacian of Gaussian) fast detection method was proposed in Literature [3] (Chen J, Duan L Y, Gao F, et al. A Low Complexity Interest Point Detector[J]. IEEE Signal Processing Letters, 2015, 22(22):172-176.), i.e., transforming Gaussian convolution from a temporal domain to a frequency domain. This method achieved a 2.0× speed-up over the whole feature detection stage of the extraction process. Literature [4] (Garbo A, Loiacono C, Quer S, et al. CDVS Feature Selection on Embedded Systems[C]. IEEE International Conference on Multimedia & Expo Workshops. IEEE, 2015:1-6) proposed an algorithm improvement method, i.e., only computing the local descriptors of selected feature points for the sake of saving the time of the local descriptor computation stage. Gaobo et al. also attempted to achieve a further speedup on a mobile device using OpenCL and GPU; however, the method presented does not comply with the OpenCL standard. Generally, the speed-up method of Garbo et al. may achieve an approximately 2.0× speed-up over the whole CDVS extraction process. Literature [5] (Zhang Shen, Wang Ronggang, Wang Qiusi, Wang Wenmin. Accelerating CDVS Extraction on Mobile Platform[C]. IEEE International Conference on Image Processing (ICIP). IEEE, 2015: 3837-3840) proposed a method for accelerating a CDVS extraction process based on an ARM platform, i.e., achieving speed-up by applying NEON SIMD multimedia instructions and a multi-core parallelism mechanism of the ARM processor, which could achieve an approximately 2.3× speed-up over the overall CDVS extracting process.
The GPGPU (General-Purpose Computing on Graphics Processing Units) technology, as disclosed in the Literature [6] (Luebke D, Harris M, Ger J, et al. GPGPU: General Purpose Computation on Graphics Hardware[C]. ACM SIGGRAPH 2004 Course Notes. ACM, 2004:87-91.), is a novel technology evolved in the recent decade. Although GPU (Graphic Processing Unit) had already been widely known and extensively applied in professional fields such as image rendering (e.g., 3D image rendering framework OpenGL) theretofore, neither the GPU nor the programming framework were “general-purpose.” Previous GPUs were always devised with fixed pipelines, such that fixed programming interfaces completely restricted higher-layer applications; a consequence was that a non-general-purpose image rendering pipeline could not carry on the task with a relatively complex algorithm logic such as image processing. The advent of GPGPU relieves the restrictions of fixed pipelines, which provides an opportunity for the emergence of a more flexible higher-layer programming framework. Besides, with improvement of the GPU fabricating process, the single-core operating frequency or caching capacity, in addition to the number of integrated operating cores, is significantly improved. Therefore, the current GPGPU has a better parallelism computation capability over a CPU (Central Processing Unit).
In view of the above, it is much practical to accelerate an MPEG-7 CDVS extraction process. However, the acceleration performance achieved by currently existing accelerating methods cannot suffice for the performance requirements of visual search applications based on this standard. The GPGPU's strong parallelism computation capability provides a new possibility of achieving a better acceleration effect.