Technical Field
The present description relates to processing digital images.
One or more embodiments may apply to the early removal and detection of interest points from still and video contents.
One or more embodiments may be used in search and retrieval applications.
Discussion of the Related Art
An increasing amount of digital still and video contents is produced and consumed every day.
Still image and video search and retrieval applications have become increasingly important for content-based image retrieval searching of objects, places and people which are part of the image contents.
Extraction of interest points and compact descriptors from still and video signals may play a significant role in those applications.
The paper by Miroslaw Bober et al. “Test Model 4: Compact Descriptors for Visual Search, Video Subgroup”, Shanghai 2012, China, ISO/IEC JTC1/SC29/WG11/W13145, discloses a model of MPEG Compact Descriptors for Visual Search (CDVS).
For instance, FIG. 1 in that document presents an extraction module which produces a compact descriptor including two main elements, namely a selected number of compressed—local—descriptors and a single—global descriptor—e.g. a digest of local descriptors, representing the whole image.
This known model may exhibit a serious lack of efficiency on the interest point side (for example DoG—Difference Of Gaussian) e.g. when a still or video image is processed.
This lack of efficiency may become increasingly serious as the number of DoG's per octave increases. For instance, if 4 octaves are processed with 5 scales computed for each octave, then 4 DoG responses are computed per each octave to a total of 20.
Avoiding inasmuch as possible DoG computation may reduce processing and computation complexity. In fact, each DoG is a per-pixel difference between consecutive scales which in turn are computed by using complex Gaussian filters applied plural times to the whole image.
Also, the Test Model considered in the foregoing involves a Keypoint Selection block using a combination of statistical features. These include distances of key points from the image center in order to weigh (and hence to rank for selection) key points before passing them to the description stage and the global vector description stage.
Such an approach may have benefits, e.g. a “native” adaptivity to the statistical nature of input data without any kind of a-priori knowledge. However, that approach fails to demonstrate adaptivity to the different nature of the visual descriptors that feed the local and global descriptor encoding processes. This may play a role in predicting points of interest in still and video images from the semantic point of view (e.g. location of faces or other objects), for the purpose of computing local and/or global descriptors.
The paper by Duy-Nguyen Ta et al. “SURFTrac: Efficient Tracking and Continuous Object Recognition using Local Feature Descriptors”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2009, describes an algorithm (SURFTrac) for extracting descriptors of a series of digital video images.
For the first image, the algorithm initializes a list of interest points by performing a full detection. As new images are received, the interest points are updated. The descriptors are for recognition purposes and the algorithm computes them as needed.
More specifically, the algorithm in question first builds a map of SURF features extracted from a set of key-frame images captured from the surrounding environment. After extracting and matching with this map SURF features of the first video frame, the algorithm tracks those features locally in the subsequent frames. The key-frame which has the largest overlapping area with the current video frame is called a key-node.
The overlapping areas of nearby key-frames are updated in every frame based on their inter-frame homographies, and thus the key-node is continuously switched to the most similar image in the database: this allows constant tracking and exploration of new regions in the video sequences. Also, during initialization, the algorithm in question computes the full SURF feature descriptors from the first video image and matches them against images.
This method constructs an approximate nearest neighbour tree for all the image features in the database followed by geometric verification (RANSAC “RANdom SAmple Consensus” algorithm).
Upon successfully identifying the matching images, the best image is marked as the current key-node, and the set of images in-play is reduced to only those images that are connected by a path in the database. The database of images is organized as follows: V is a collection of images; G is an undirected graph where images forms the nodes in the graph, and the edges describe the relationships between the images.
An edge between two images indicates a geometric relationship when these two images can be related through standard pairwise image matching. Each image is also further identified with one or more identifiers and two images sharing the same identifier are also connected by an additional edge. This organization is similar to a graph of images constructed for hierarchical browsing purposes.
Additional documents of interest may include, e.g.:    Agrawal et al.: “Censure: Center Surround Extremas for Realtime Feature Detection and Matching”, in European Conference on Computer Vision—ECCV, pp. 102-115, 2008; DOI: 10.1007/978-3-540-88693-8_8    Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool: “SURF: Speeded Up Robust Features”, Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346-359, 2008    Rosten et al.: “Machine Learning for High-Speed Corner Detection”, in Conference: European Conference on Computer Vision—ECCV, vol. 1, pp. 430-443, DOI: 10.1007/11744023_34    Salti, S.; Tombari, F.; Di Stefano, L. A: “Performance Evaluation of 3D Keypoint Detectors 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT)”, 2011 International Conference on Digital Object Identifier: 10.1109/3DIMPVT.2011.37 Publication Year: 2011, Page(s): 236-243.