Image registration is the process of transforming different sets of data of different images into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. The image registration is used in computer vision, medical imaging, biological imaging and brain mapping, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements and to perform various computer vision actions.
Image registration or image alignment algorithms can be classified into intensity-based and feature-based. One of the images is referred to as the reference or source and the others are respectively referred to as the target, sensed or subject images. Image registration involves spatially registering the target image(s) to align with the reference image. Intensity-based methods compare intensity patterns in images via correlation metrics, while feature-based methods find correspondence between image features such as points, lines, and contours. Intensity-based methods register entire images or sub-images. If sub-images are registered, centers of corresponding sub images are treated as corresponding feature points. Feature-based methods establish a correspondence between a number of especially distinct points in images. Knowing the correspondence between a number of points in images, a geometrical transformation is then determined to map the target image to the reference images, thereby establishing point-by-point correspondence between the reference and target images.
Feature descriptors of the feature-based image registration methods are used in a variety of imaging applications, including object recognition applications, 3D reconstruction applications, image retrieval applications, camera localization applications, and the like. Such feature descriptors may be used to compute abstractions of image information. The widespread use of feature descriptors has driven the development of a large number of alternative descriptors that are based on various concepts, such as Gaussian derivatives, moment invariants, complex features, phase-based local features, or the like. However, efficient descriptors are expected to have low computational complexity, easy matching characteristics, and high memory efficiency. Current descriptors generally do not include all of these qualities.
In addition, because images captured in the information processing apparatus, e.g., cameras, may be affected by various environmental factors such as their sizes, illuminations, obstacles, rotations, etc., it may have numerous difficulties to recognize objects in the images robustly. Thus, conventional feature point extraction methods such as Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Oriented FAST and Rotated BRIEF (ORB) have been used for recognizing objects and registering the images.
The SIFT is a method for extracting feature points which can be applied to an image processing system such as a surveillance camera or an autonomous navigation system. The SIFT derives a high-order descriptor from feature points of the objects in the image. Also, the SURF is a method for extracting feature points which can be applied to an image processing system such as an object tracking system or a panorama image generating system. In the SURF, objects can be recognized by generating an integral image to which pixel values from an input image are summed and deriving feature points and a high-order descriptor according to each scale of the integral image.
Although the techniques SIFT and SURF have advantages that they are robust to image (or, object) size, illumination, and changes of image due to rotations, they have disadvantages that complex computation is required for implementing their algorithm, and computational speed may significantly degrade as the number of feature points increases. Also, it is difficult that the techniques are used for real-time processing in a mobile communication terminal having a low computational ability or a mobile environment with a limited power supply.
For example, the SIFT descriptor is highly discriminative but, being a 128-vector, is relatively slow to compute and match descriptors. The SURF descriptor is faster to compute and match descriptors. However, since the SURF descriptor is a 64-vector of floating point values, it is represented by 256 bytes. This size may become costly as the number of descriptors to be stored increases. Several other extensions of SIFT have also been proposed, including dimensionality reduction techniques, quantization based techniques, descriptor binarization techniques, and the like. However, these techniques remain time and computation consuming.
Therefore, binary descriptors have been proposed. For example, in the ORB, in order to recognize objects, feature points are extracted by FAST or BRIEF methods to generate a binary descriptor. The ORB technique enhances a speed of recognizing objects in an input image as compared to the SIFT and SURF which use the high-order descriptor. Pixel comparisons are faster to determine than gradient operations, which are used in common gradient based descriptors, e.g., ORB is two orders of magnitude faster than SIFT, without losing much on the performance with respect to keypoint matching, see E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to sift or surf,” in International Conference on Computer Vision (ICCV), 2011, D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. 91-110, November 2004. However, the accuracy of image registration with binary descriptors such as ORB is lower than the accuracy of other methods, such as SIFT and SURF. Accordingly, there is a need to improve the accuracy of the image registration with binary descriptors while maintaining their computational efficiency. In addition, there is a need to develop customized descriptor parameters that will allow accurate performance with different image capture settings (different cameras, viewpoints, times, etc.).