1. Field of Invention
The present invention generally relates to object recognition in computer vision. More specifically, it relates to a biometric identification system using finger vein patterns as the means for recognition and authentication.
2. Description of Related Art
Biometrics refers to the use of intrinsic human traits for personal identification purposes. That is, a person may be identified by one or a combination of multiple different personal trait characteristics of that person. Examples of such personal traits are a fingerprint, a hand print (length and thickness of the fingers, size of the hand itself), a retina scan (pattern of blood vessels in the eye), an iris scan, a facial photograph, a blood vessel pattern (vein pattern), a voice print, a dynamic signature (the shape and time pattern for writing a signature), or a keystroke pattern (key entry timing).
Typically, a person wanting to be identified as being pre-registered within a registry of persons will submit a sample of a particular biometric, and the submitted biometric is then compared to a library of registered biometric samples in an effort to identify a match. Some biometric samples may originate in the form of an image, such as a fingerprint or iris scan. Computer vision techniques, however, are generally not directly applicable to the field biometrics.
For example, one computer vision technique is the Active Appearance Model (AAM). It typically draws generalities about the look of a specific class (or type) of object from a predefined viewpoint given an extensive library of sample images of that class of object from that predefined viewpoint. That is, an AAM machine examines a large library of training images, identifies commonalties among the sample training images, and then searches for those commonalties (within defined statistical variations) in a test image to determine if a general example of the sought class of object can be found in the test image.
An AAM machine uses the large library of training images of a given object type to define a statistical model of the generally acceptable shape and appearance of the given object, and to further define acceptable variations in the shape and appearance of the object. The prior knowledge gleaned from the training library thus establishes constrains for the AAM machine to search for an instance of the sought object in a test image. AAM machines have found extensive application in face recognition since the human face can generally be described in terms of general predicable characteristics, such as having two neighboring eyes, one nose below a point between the two neighboring eyes, one mouth below the nose, etc. AAM machines are an example of constraining an object search based on previously established expectations.
AAM machines, however, require large libraries and extensive preparation of the training images and the test image. That is, human involvement is required to identify the distinguishing features of an object in the training image, and to mark these features manually. The test image may also require that these distinguishing features be marked prior to being submitted to the AAM machine for identification. In the case of human face recognition, the marking of features in the test image can typically be automated since the general structure of a human face is known. For example, a face detecting algorithm may be used to identify the location of a face within a test image, and a canonical face (i.e. a statistically normalized face based on the library of training images) with its distinguish features already marked may be fitted to onto the located face within the test image.
Unfortunately, most biometrics cannot be condensed to a list of definable, and predictable, distinguishing features shared by a library of training images. For example, a finger vein patterns man not necessary follow consistent, definable predetermined patterns across training images from multiple different people and from different parts of a finger and from different view points of the finger. That is, the arrangement, relative thickness, and number of veins visible in an image will likely not follow predictable and definable constraints. Additionally, it is generally not clear to a human observer what characteristic features may be consistent across all training images of finger veins.
Thus, rather than establishing a general model based on expected characteristics of a test sample, biometrics more typical utilize pattern identification techniques that define a pattern in a given diagnostic image and then compare the defined pattern with a library of pre-registered patterns.
For example, one technique for identifying blood vessel patterns is by means of path-based tree matching, such as described in U.S. Pat. No. 7,646,903. Tree matching algorithms require tree structures as input. Each tree structure describes the tree as a series of branches interconnected through branch points. Several known algorithms can be used to obtain the tree structure including tracking, segmentation, and skeletonization. Once the tree structure is obtained, a matching algorithm operates directly on the structure and any data contained therein.
An integral part of pattern identification techniques is feature detection. In the field of computer vision, techniques are known for identifying feature points, or individual pixels, in an image that may be used to describe an imaged scene. As an example, if one has a library of identifying feature points obtained from a library of training images, then one may search an input digital (test) image for those identifying features in an effort to determine if an example of the specific object is present in the input digital image. In the field of computer vision, this idea has been extended to matching common features of a common scene in multiple digital images of the common scene taken from different view angles to index, i.e. match or correlate, feature points from one image to the other. This permits the combined processing of the multiple digital images.
For example in FIG. 1, images 2, 4, 6 and 8 each provide partial, and overlapping, views of a building in a real-world scene, but none provide a full view of the entire building. However, by applying edge detection and indexing (i.e. identifying matching pairs of) feature points in the four partial images 2, 4, 6 and 8 that correlate to the same real feature point in the real-world scene, it is possible to stitch together the four partial images (i.e. applying an image stitching tool) to create one composite image 10 of the entire building. The four partial images 2-8 of FIG. 1 are taken from the same view angle, but this approach may be extended to the field of correspondence matching, where images of a common scene are taken from different view angles.
In the field of computer vision, correspondence matching (or the correspondence problem) refers to the matching of objects (or object features or feature points) common to two, or more, images. Correspondence matching tries to figure out which parts of a first image correspond to (i.e. are matched to) which parts of a second image, assuming that the second image was taken after the camera had moved, time had elapsed, and/or the pictured objects had moved. For example, the first image may be of a real-world scene taken from a first view angle with a first field of vision, FOV, and the second image may be of the same scene taken from a second view angle with a second FOV. Assuming that the first and second FOVs at least partially overlap, correspondence matching refers to the matching of common features points in the overlapped portions of the first and second images.
Correspondence matching is an essential problem in computer vision, especially in stereo vision, view synthesis, and 3D reconstruction. Assuming that a number of image features, or objects, in two images taken from two view angles have been matched, epipolar geometry may be used to identify the positional relationship between the matched image features to achieve stereo view, synthesis or 3D reconstruction.
Epipolar geometry is basically the geometry of stereo vision. For example in FIG. 2, two cameras 11 and 13 create 2D images 15 and 17, respectively, of a common 3D scene 12 consisting of a larger sphere 19 and a smaller sphere 21. 2D images 15 and 17 are taken from two distinct view angles 23 and 24. Epipolar geometry describes the geometric relations between points in 3D scene 12 (for example spheres 19 and 21) and their relative projections in 2D images 15 and 17. These geometric relationships lead to constraints between the image points, which are the basis for epipolar constraints, or stereo constraints.
FIG. 2 illustrates a horizontal parallax where, from the view point of camera 11, smaller sphere 21 appears to be in front of larger sphere 19 (as shown in 2D image 15), but from the view point of camera 13, smaller sphere 21 appears to be some distance to the side of larger sphere 19 (as shown in 2D image 17). Nonetheless, since both 2D images 15 and 17 are of a common 3D scene 12, both are truthful representations of the relative positions of larger sphere 19 and smaller sphere 21. The geometric positional relationships between camera 11, camera 13, smaller sphere 21 and larger sphere 19 thus establish geometric constraints on 2D images 15 and 17 that permit one to reconstruct the 3D scene 12 given only the 2D images 15 and 17, as long as the epipolar, or stereo, constraints are known.
Feature based correspondence matching algorithms have found wide application in computer vision. Examples of feature based correspondence matching algorithms are the scale-invariant feature transform, SIFT, and the Affine SIFT (or ASIFT). It is noted, however, that feature based correspondence matching algorithms such as SIFT and Affine SIFT purposely exclude edge points from their analysis, and thus are not well suited for edge detection.
As it is known in the art, the SIFT algorithm scans an image and identifies points of interest, or feature points, which may be individual pixels and describes them sufficiently (typically relative to its neighboring pixels within a surrounding window) so that the same feature point (or pixel) may be individually identified in another image. A discussion of the SIFT transform is provided in U.S. Pat. No. 6,711,293 to Lowe, which is herein incorporated in its entirety by reference. Essentially, SIFT uses a library of training images to identify feature points that are characteristic of a specific object. Once a library of the object's characteristic feature points have been identified, the feature points can be used to determine if an instance of the object is found in a newly received test image.
Principally, feature points (i.e. points of interest) of the object are extracted to provide a “feature description” of a specific object. This description, extracted from training images, can then be used to identify the specific object in a test image containing many object-types. To perform reliable recognition, it is preferred that the features extracted from the training images be detectable under changes in image scale, noise, illumination, and rotation. Feature points usually lie near high-contrast regions of the image. However, since distortion of an object (such as if a feature points is located in an articulated or flexible parts of the object) may alter a feature point's description relative to its neighboring pixels, changes to an object's internal geometry may introduce errors. To compensate for these errors, SIFT typically detects and uses a large number of feature points so that the effects of errors contributed by these local variations may be reduced.
In a typical SIFT application, feature points of objects are first extracted from a set of training images and stored in a database. An object is recognized in a new image (i.e. a test image) by individually comparing each feature point extracted from the new image with the feature points in this database and finding candidate matching features based on Euclidean distance of their feature point vectors. From the full set of matches, subsets of feature points that agree on the object and its location, scale, and orientation in the new image are identified to filter out good matches. Consistent clusters of good matches are then identified. Typically, each cluster of three or more features that agree on an object and its pose is then subject to further detailed model verification and subsequently outliers are discarded. Finally the probability that a particular set of features indicates the presence of a specific object is computed, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct.
An example of a SIFT determination of feature points is illustrated in FIG. 3. Possible feature points are first identified, as indicated by dark dots in image 16. Possible feature points that have a low contrast are then discarded, as illustrate in image 18. Finally, possible features points located on edges are removed, which leaves the final set of feature points shown in image 20.
Thus, SIFT permits one to match feature points of an identified object from one image to another. This is illustrated in FIG. 4, where three images of the same object, i.e. a happy face, are shown. For illustration purposes, only four feature points, corresponding to points near the eyes and the corners of the mouth, are shown. As indicated in FIG. 4, SIFT can match feature points from a first face 25 to a second face 26 irrespective of a change in scale. SIFT can also match feature points from first face 25 to a third face 27 irrespective of rotation. However, SIFT has been found to have limited immunity to affine transforms of images. That is, SIFT is limited to the amount of change in the view-angle an imaged object can undergo and still be identified.
A method of extending a SIFT transform to better handle affine transformations is described in “ASIFT: A New Framework for Fully Affine Invariant Image Comparison” by Morel et al, SIAM Journal on Imaging Sciences, vol. 2, issue 2, 2009, herein incorporated in its entirety by reference.
With reference to FIG. 5, the object in an Affine SIFT would be better able to match feature points from first face 25, to representations of the same object that have undergone affine transformations, as illustrated by happy faces 28, 29, and 30.
An example of an application of an Affine SIFT transform is illustrated in FIG. 6, where multiple feature points are matched from a first image 31 of the stature of liberty from a first view angle, to a second image 32 of the statue of liberty from a different view angle and at a different scale.
It is an object of the present invention to utilize techniques from computer vision to define constrains useful in biometrics to better identify and authenticate a potential registrant.
It is another object of the present invention to combine biometric identification techniques with object recognition techniques to improve biometric matching results.