For many visual tasks, the manner in which the image is represented can have a substantial effect on both the performance and the results of the visual task. Convolutional neural networks (CNN) are known in the art. These artificial networks of neurons can be trained by a training set of images and thereafter be employed for producing multiscale representations of an input image.
An article by Krizhevsky et al., entitled “ImageNet Classification with Deep Convolutional Neural Networks” published in the proceedings from the conference on Neural Information Processing Systems 2012, describes the architecture and operation of a deep convolutional neural network. The CNN of this publication includes eight learned layers (five convolutional layers and three fully-connected layers). The pooling layers in this publication include overlapping tiles covering their respective input in an overlapping manner. The detailed CNN is employed for image classification.
An article by Zeiler et al., entitled “Visualizing and Understanding Convolutional Networks” published on http://arxiv.org/abs/1311.2901v3, is directed to a visualization technique that gives insight into the function of intermediate feature layers of a CNN. The visualization technique shows a plausible and interpretable input pattern (situated in the original input image space) that gives rise to a given activation in the feature maps. The visualization technique employs a multi-layered de-convolutional network. A de-convolutional network employs the same components as a convolutional network (e.g., filtering and pooling) but in reverse. Thus, this article describes mapping detected features in the produced feature maps to the image space of the input image. In this article, the de-convolutional networks are employed as a probe of an already trained convolutional network.
An article by Simonyan et al., entitled “Deep Inside Convolutional Networks: Visualizing Image Classification Models and Saliency Maps” published on http://arxiv.org/abs/1312.6034, is directed to visualization of image classification models, learnt using deep Convolutional Networks (ConvNets). This article describes two visualization techniques. The first one generates an image for maximizing the class score based on computing the gradient of the class score with respect to the input image. The second one involves computing a class saliency map, specific to a given image and class.
An article by Li et al., entitled “Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network” published on http://arxiv.org/abs/1406.3474, is directed to a method for estimating a pose of a human subject in an image. The method involves backtracking an output of a convolutional layer to the respective patch in the original input image. Specifically, the first convolutional layer receives the complete input image. However, subsequent middle layers are only locally connected therebetween (i.e., are not fully connected) and therefore the activation of some filters in the middle layers are affected by patches of the original input image. This publication suggests an algorithm for backtracking the filter output to the specific patch of the input image, which activated the filter.
Images can be represented as graphs for performing various visual tasks, such as determining image similarity, image retrieval, machine vision, and the like. Techniques for graph matching, for performing such tasks, are known in the art. Reference is now made to U.S. Pat. No. 8,818,106, issued to Chertok et al., and entitled “Method for Organizing a Database of Images and Retrieving Images from That Database According to a Query Image”. This publication describes a method for determining a matching score between a first set of n1 feature points, and a second set of n2 feature points. The method includes the steps of producing a triple-wise affinity tensor, determining a leading eigenvector, iteratively producing a binary optimal assignment vector and determining a matching score. First the triple-wise affinity tensor is produced by ranking the affinity of the different triplets of feature points of each of the images. Specifically, the triple-wise affinity tensor details the affinity score of assignments of triplets of feature points of the first set of feature points and triplets of feature points of the second set of feature points. It is noted that the some triplet assignments can be preliminary neglected by considering the descriptors of the feature points. Then the leading eigenvector of the triple-wise affinity tensor is determined. The binary optimal assignment vector is produced by discretization of the leading eigenvector. Lastly, the matching score between the first set of feature points and the second set of feature points is determined according to the triple-wise affinity tensor and according to the optimal assignment vector. Other methods for solving an assignment problem, are also known in the art, for example, the Hungarian algorithm.