There are common features between infrared (IR) and visual images of the human body. Using the face as an example, head shape and size, and the relative location, shape, and size of features such as the eyes, mouth, and nostrils are the same in both imaging modes. A database of images can be segmented into classes using metrics derived from those common features, and the same classification will be obtained from either visual or infrared images. Height can be also used as a classification measure when it can be inferred from the collected image or from separate sensor data. An infrared image of an unclothed area of the body, such as the face, presents much more detailed and person-specific information than does a visible image. However, visible images are more commonly collected and large historical databases of visual images exist. It is therefore desirable to automate a process for comparing imagery from both the visual and infrared modes.
Infrared images are unique to each person, even for identical twins. Visual images are not unique because many people look similar and can disguise themselves to look enough like one another that an automated identification system cannot distinguish them. Therefore, in a large database, it is not possible to automatically perform a one-to-one linkage between infrared and visual images because the visual images are not sufficiently unique. However, for each infrared image, an automated system can eliminate all visual images which cannot be a match due to insufficient correspondence between minutiae characteristics. In general, it is estimated that more than 95% of a visual database can be eliminated as a match to a given infrared image. This has application to the use of infrared surveillance cameras to identify wanted persons for whom only visual images are on file. The infrared-visual matching system compares each person it sees in infrared and classifies him as either a possible match to someone on the visual image watch list or not a match. Persons who are possible matches can then receive greater attention from immigration or security authorities. This allows the use of infrared surveillance imagery to proceed without waiting until a large database of infrared images is established.
The use of infrared imagery also provides for the detection of disguises, whether worn or surgical, which may not be detectable from visible imagery. For example, artificial facial hair such as a mustache is readily detectable in an infrared image although it appears natural in visible images. The fact that infrared surveillance imagery shows a man with a fake mustache provides a clue to consider in matching against a visible image database. Surgical disguises such as a face lift leave telltale short and longer term variations in the facial thermogram, while the visual image may appear to be a different person and show no sign of surgery, The ability to detect in IR images that surgical changes have been made to a particular area of the face permits an automated system to broaden the parameters for searching for possible matching visual images in an historical database.
High definition visual images of the face and body are routinely produced and stored for medical, diagnostic and forensic use. Common examples are the photographing of criminal suspects through booking stations producing xe2x80x9cmug shotsxe2x80x9d, driver""s license photographs produced by each state, and passport photos used by the State Department. Many such large facial image databases exist, in hardcopy and in electronic form, and there is increasing research ongoing into automated matching of newly taken images with those databases. For example, there are frequent attempts to match surveillance images of a person using a stolen credit card at an ATM with photographs of persons previously convicted of similar crimes.
Visual imagery, particularly from surveillance cameras, is often of poor quality due to dim illumination at the scene. Low light level or infrared cameras are expected to become more widely used for surveillance as their cost diminishes. There is therefore a need to correlate between newly acquired infrared images and existing databases of video images. Even in the future, when simultaneous collection of video and IR images will generate correlated databases, there will always be a need to match images taken in one spectral domain with images taken in another. This can include matching images taken in one IR band (such as 3-5 micron) with images taken in another IR band (such as 8-12 micron).
Since IR cameras are passive, emitting no radiation and therefore presenting no health hazards, they may be used in conjunction with other imaging medical devices such as x-ray, sonogram, CAT scan devices, etc. Minutiae derived from the IR image may then be superimposed or annotated onto the resulting medical image. This presents a standard technique for generating standardized reference points on all medical imagery. Subsequently, the method and apparatus of this invention can be used to search a database of annotated medical images to find a match with a current IR image or current medical image annotated with IR minutiae.
Regions of Interest (ROI) may be utilized instead of minutiae, where the ROI may be elemental or other shapes including fractal or wavelet-derived structures, segments of blood vessels, locations underneath or otherwise relative to tattoos, moles, freckles, or other distinguishable features, or wiremesh or finite elements used for thermodynamic or visible modeling of the body. Rules may relate the shapes and positions of such elements, their centroids and other features. Time sequences of minutiae and ROIs may be compared, with the decision as to a possible match made on the basis of cumulative thresholds and rule tolerances over the sequence.
Facial expression and speech modeling has application to synthetic videoconferencing and face animation. Substantial bandwidth and storage reduction can result. Use of IR minutiae offers more precise modeling than current use of visual images. The present invention provides a technique by which IR images can be tied to the visual image being displayed.
The identification of persons from infrared images is known in the art as evidenced by the Prokoski et al U.S. Pat. No. 5,163,094 which discloses a method and apparatus for analyzing closed thermal contours, called xe2x80x9celemental shapesxe2x80x9d which are created by the vascular system interacting with the anatomical structure. Fifty or more elemental shapes can be identified for example in a human face imaged with an IR camera which has an NETD (noise equivalent thermal difference) of 0.07xc2x0 C. and a spatial resolution of 256xc3x97256 pixels. Characteristics of those shapes, such as the centroid location and ratio of area to perimeter, remain relatively constant regardless of the absolute temperature of the face, which varies with ambient and physiological conditions. Two infrared images are compared by comparing the characteristics of corresponding shapes. A distance metric is defined and calculated for each pair of images. If the value is within a threshold, the two images are considered to be from the same person.
In the Prokoski et al U.S. patent application Ser. No. 08/514,456, there is disclosed a method and apparatus for extracting and comparing thermal minutiae corresponding to specific vascular and other subsurface anatomical locations from two images. Minutiae may be derived from thermal contours, or may be absolutely associated with specific anatomical locations which can be seen in the thermal image, such as the branching of blood vessels. Each minutia is then associated with a relative position in the image and with characteristics such as apparent temperature, the type of branching or other anatomical feature, vector directions of the branching, and its relation to other minutiae.
The comparison of thermal minutiae from two facial images is analogous to the comparison of sets of fingerprint minutiae, in that two images are said to identify the same person if a significant subset of the two sets are found to correspond sufficiently in relative positions and characteristics. Classification of the facial thermograms can be performed to partition a database and reduce the search for matching facial patterns. Alternately, encoding of the minutiae patterns offers a unique FaceCode which may be repeatably derived from each person, minimizing the need for searching a database.
Infrared imaging can be used to locate minutiae points over the entire body surface which correspond to specific anatomical locations such as intersection points and branch points of the underlying blood vessels. The thermal minutiae technique and apparatus utilizes a built-in set of whole-body registration points viewable in IR on the face and body surface. The registration points can then be used to compare infrared images taken with different equipment at different times of different people and under different conditions to facilitate comparison of those images.
The IR camera is totally passive, emitting no energy or other radiation of its own, but merely collecting and focusing the thermal radiation spontaneously and continuously emitted from the surface of the human body. Current IR cameras operating in the mid to long wavelength region of 3-12 microns, record patterns caused by superficial blood vessels which lay up to 4 cm below the skin surface. Future cameras will have increased sensitivity which will translate into even more defined minutiae. With current IR cameras, approximately 175 thermal facial minutiae may be identified in thermal images from superficial blood vessels in the face. More than 1000 thermal minutiae may be identified over the whole body surface. Using more sensitive infrared cameras, additional minutiae from deeper vascular structures may be identified in the thermal images.
The normal body is basically thermally bilaterally symmetric. Side to side variations are typically less than 0.25 degrees Celsius. This fact is used in assigning axes to the body""s image. Where the skin surface is unbroken, there is a gradual variation of temperatures across blood vessels, with the highest temperatures across the body surface being directly on top of major blood vessels. Major thermal discontinuities occur at entrances to body cavities such as the eye sockets, nostrils, or mouth. These provide global reference points for automatic orientation of the thermal image. Local and relatively minor discontinuities in the skin surface occur at scars, moles, burns, and areas of infection. The thermal surface can be distorted through pressures and activities such as eating, exercising, wearing tight hats and other clothing, sinus inflammation, infection, weight gain and loss, and body position. However, the minutiae points remain constant with respect to their position relative to the underlying anatomy.
The technique for thermal minutiae extraction and matching can be summarized as follows:
1. The current thermal image is digitized.
2. The current image is divided into pixels, where the size of the pixel relates to the resolution or quality of the result desired.
3. Certain pixels are selected as minutiae points.
4. Each minutia is assigned characteristics such as one or more vectors having magnitude and directional information in relation to the surrounding areas of the thermal image about that minutia, absolute or relative temperature at or around the minutia location, shape of the surrounding thermal area or areas, curvature of the related shape or shapes, size of the surrounding shape or shapes, location of the minutia relative to the body, distance to other minutiae, vector length and direction to other minutiae, number of crossings of thermal contours between it and other minutiae, number of other minutiae within a certain range and direction, the type of minutiae such as the apparent end point of a blood vessel, a point of maximum curvature of a thermal contour, all points on an anatomical element such as a blood vessel which can be distinguished by thresholding or range gating or focusing the thermal camera or image, the centroid of a lymph node, or the centroid or other reference of an anatomical structure with distinguishing thermal capacitance. Either active or passive infrared imaging can be used. For active imaging, the subject can be subjected to heat or cold by external application of hot or cold air, illumination, dehumidification, ingestion of hot or cold foodstuffs, or ingestion of materials which cause vasodilation or vasoconstriction.
5. A set of minutiae characteristics of the current image is compared by computer to the set of minutiae characteristics of other images.
6. The comparison results are used to determine corresponding minutiae from the two images, and to morph or mathematically adjust one image with respect to the other to facilitate comparison.
7. The differences between the current image and database images are computed for the entire image or for areas of interest.
8. The differences are compared to a threshold and image pairs which exceed the threshold are considered impossible matches.
Infrared facial minutiae maybe derived from elemental shapes (such as by using the centroids of each shape or the zero locations resulting from wavelet compression and expansion). Particularly when high quality infrared images are used, absolute minutiae can be directly extracted without the computationally intensive analysis required for template or shape comparisons.
It is also known in the prior art to compare visible images through fiducial points involving definition of face metrics which may be considered to have aspects in common with the present invention. For example, the Tal U.S. Pat. No. 4,975,969 discloses a method and apparatus for uniquely identifying individuals by measurement of particular physical characteristics viewable by the naked eye or by imaging in the visible spectrum. Tal defined facial parameters which are the distances between identifiable parameters on the human face, and/or ratios of the facial parameters, and used them to identify an individual since he claims that the set of parameters for each individual is unique. Particular parameters such as the distance between the eye retina, the distance from each eye retina to the nose bottom and to the mouth center, and the distance from the nose bottom to the mouth center are set forth, as they may be particularly defined due to the shadowed definable points at each end.
The approach disclosed in the Tal patent utilizes visible features on the face from which a unique set of measurements and ratios allegedly can be developed for each individual. This approach is not particularly satisfactory, nor does it pertain to identical twins. In addition, the xe2x80x9crubber sheetingxe2x80x9d effect caused by changes in facial expression, the aging effects which cause lengthening of the nose, thinning of the lips, wrinkles, and deepening of the creases on the sides of the nose, would all cause changes in the parameters and in their ratios. Therefore, very few measurements which can be made on a human face are constant over time, and the paucity of such constant measurements makes it improbable that facial metrics in visible images can be useful for identification of sizable populations. The Tal patent does not deal with comparison of images from other than visible detectors, and so does not consider the specific focus of the present invention which is the comparison of images from different spectral bands. Moreover, the Tal patent does not specifically caution about varying lighting conditions, which could severely limit the utility of the technique, even for classification.
Visible face metrics may be useful as a classification technique, but the visible features can be modified cosmetically or surgically without detection, resulting in misclassification. By contrast, the technique of the present invention utilizes hidden micro parameters which lie below the skin surface, and which cannot be forged. The current patent""s use of underlying features which are fixed into the face at birth and remain relatively unaffected by aging provides for less inherent variability in the values of the parameters over time than is provided by the prior art.
Visible metrics require ground truth distance measurements unless they rely strictly upon ratios of measurements. They can be fooled by intentional disguises, and they are subject to variations caused by facial expressions, makeup, sunburns, shadows and similar unintentional disguises. Detecting disguises and distinguishing between identical twins may or may not be possible from visible imagery if sufficient resolution and controlled lighting is available. However, the level of resolution which may be required significantly increases the computational complexity of the identification task, and makes the recognition accuracy vulnerable to unintentional normal variations.
The use of eigenanalysis of visual faces to develop a set of characteristic features is disclosed in Pentland (MIT Media Laboratory Perceptual Computing Section, Technical Report No. 245 View-Based and Modular Eigenspaces for Face Recognition). Faces are then described in terms of weighting of those features. The approach claims to accommodate head position changes and the wearing of glasses, as well as changes in facial expressions. A representative sample of 128 faces was used from a database of 7,562 images of approximately 3000 people. A principal components analysis was performed on a representative sample. The first 20 eigenvectors were used. Each image was annotated by hand as to sex, race, approximate age, facial expression, etc. Pentland does not deal with comparing images from different spectral bands. Nor does his technique perform well in the case of visible images obtained under differing lighting conditions.
Pentland discloses that pre-processing for registration is essential to eigenvector recognition systems. The processing required to establish the eigenvector set is extensive, especially for large databases. Addition of new faces to the database requires the re-running running of the eigenanalysis. Pentland and other xe2x80x9ceigenfacexe2x80x9d approaches are database-dependent and computationally intensive. In contrast, the proposed minutiae comparison of the present invention is independent of the database context of any two images. Minutiae are directly derived from each image, visible or IR, and compared using fixed rules, regardless of the number or content of other images in the database.
An approach for comparing two sets of image feature points to determine if they are from two similar objects is disclosed in Sclaroff (Sclaroff and Pentland: MIT Media Laboratory, Perceptual Computing Technical Report #304). He suggests that first a body-centered coordinate frame be determined for each object, and then an attempt be made to match up the feature points. Many methods for finding a body-centered frame have been suggested, including moment of inertia methods, symmetry finders, and polar Fourier descriptors. These methods generally suffer from three difficulties: sampling error; parameterization error; and non-uniqueness.
Sclaroff introduces a shape description that is relatively robust with respect to sampling by using Falerkin interpolation, which is the mathematical underpinning of the finite element method. Next, he introduces a new type of Galerkin interpolation based on Gaussians that allow efficient derivation of shape parameterization directly from the data. Third, he uses the eigenmodes of this shape description to obtain a canonical, frequency-ordered orthogonal coordinate system. This coordinate system is considered the shape""s generalized symmetry axes. By describing feature point locations in the body-centered coordinate system, it is straight-forward to match corresponding points, and to measure the similarity of different objects.
Applicant has previously utilized a principal components analysis of thermal shapes found in facial thermograms. The resulting accuracy of 97% from IR images equals or surpasses the results reported by Pentland with visible facial images. Applicant""s training database, furthermore, included identical twins and involved non-cooperative imaging of about 200 persons. Thus, the head sizes and orientations were not pre-determined as they were in the Pentland study. As a result, the use of eigenanalysis of thermal shapes is more robust than the use of eigenanalysis of visual facial features. However, the basic requirements of eigenanalysis still pertain to their use in matching of thermal images by consideration of inherent elemental shapes. That is, the approach is computationally intensive, requires a pre-formed database, and requires standardization of the images through pre-processing.
The present invention differs from prior visible and IR recognition approaches in that it does not merely sample a finite number of points on an image grid; it extracts points which have particular meaning in each spectrum and automatically distinguishes between cross-spectrum minutiae which are coincident and those which are related by rules associated with anatomical bases. It assigns a difference or feature space distance to each pair of coincident minutiae, with a total distance calculated over all such pairs. This first step may be used to eliminate candidate matches which produce distances above a threshold. Then the spectrum-dependent minutiae are compared relative to anatomical rules to further eliminate impossible candidate matches. The prior art has not addressed alignment and comparison of visual/IR or IR/IR human images based upon anatomical rules and the characteristics of features viewable in the IR image.
It is a primary object of the present invention to provide a method and apparatus for identifying visual images which may be a match to infrared images of faces or bodies. A thernal image of a portion of the individual""s body is generated and is processed to produce a set of minutiae points, together with characteristics which describe each such point and its relation to other minutiae. That combination of minutiae and characteristics is considered unique to the individual and essentially persistent in spite of ambient, physiological, emotional, and other variations which occur on a daily basis. Any portion of the body can be utilized, but the face is preferred due to its availability. Since parts of the face may be blocked by glasses, facial hair, or orientation to the sensor, such as a camera, the system and method allows for identification based on partial faces.
Candidate visual images are processed to extract minutiae characteristic of the subject and the visual spectrum. The IR and visual images are scaled to the same standard and aligned based upon minutiae which are coincident in the two spectra. A measure of the amount of warping required to accomplish the alignment is calculated. Then other spectrum-dependent minutiae are compared, with relation to certain rules which would be met if the two images were of the same person, based upon anatomical structures of the human face and body. A measure of the degree of compliance with the rules is calculated. The decision to include or exclude a given visual image from the class of possible matching images to the infrared image is made based upon these measures relative to thresholds which are established to control possible errors in the system.
Just as locating the center of a fingerprint is essential to certain fingerprint matching algorithms, establishing axes for the facial minutiae is also essential. In an interactive system, human operators establish face axes, similar to fingerprint examiners setting the orientation of latents. A human demarcates the eye pupils, canthi and/or nostrils by manipulating a cursor on the system display. Axes are then automatically generated vertically through the center of mass of the eye pupils or canthi and nostrils and horizontally through the pupils or canthi centroids. If the axes are not perpendicular, the vertical axis can be adjusted to not necessarily bisect the nostrils. The human operator also indicates any unusual features, such as a missing eye or eye patch, wearing of bandages, tattoos, deformation of the lips or other visible gross thermal asymmetries of the face. An automated system can perform these as well.
The unknown face is partitioned into segments, and corresponding segments matched. This will accommodate matching of partial faces when faces are partially disguised or hidden behind other faces in a crowd.
In the full-frontal face, the thermal image is grossly symmetrical bilaterally. The canthi or sinus areas in normal individuals are the hottest extended areas of the face. When glasses are not worn, it is a simple process to locate the canthi in the thermal image and use them to establish axes for the face. Other features which may be used are the nostrils, which may present alternately hot and cold bilaterally symmetric areas as the individual breathes in and out. The horizontal axis may be drawn through the outer corners of each eye, which are readily distinguishable in the infrared images or through the pupils which may be seen in some IR imagery. The vertical axis may then be drawn through the bow of the upper lip, or through the center point of the two nostrils, or at the midpoint between the eye corners. The intersection of the two axes will occur at the center of the two eyes. The midpoint between the horizontal through the eyes is defined as the center of the face.
If the person is wearing glasses, the pattern of the glasses, which block the infrared emissions from the face and thereby produce an extended cold area with sharp cut-off thermally, can be used to approximate the facial axes. If a sufficient number of minutiae are obtainable from portions of the face not blocked by glasses, facial hair, or other concealments, a person may be identifiable. Alternatively, if fewer than a minimum number of minutiae specified for a particular scenario are extracted by an automated system for a particular person, that person may be considered by the system to be a potential match, but be tagged as having a low number of minutiae.
Various perturbations, such as facial expression changes, can distort the relative locations of minutiae points to an extent. This is analogous to the deformations which occur in fingerprints due to movement between the fingers and the print surface. The minutiae matching algorithms allow for variations in the position and characteristics of the minutiae, as well as in the subset of minutiae which are seen due to the field of view of the camera and to possible obstruction of certain areas of the face in the image.
The face surface presents a smooth continuum of thermal levels, and reflects metabolic activity, ambient and internal temperatures, and ambient sources of thermal energy. Discontinuities occur at breaks in the skin continuum, such as caused by the nostrils, the mouth opening, the eyes, facial hair, moles or other skin disturbances, and any applique such as bandages.
According to a preferred embodiment of the invention, minutiae are used from the face. The minutiae are referenced to axes derived from specific physiological features. Although many different approaches may be used to obtain repeatable minutiae from facial thermograms, the preferred approach uses a number of extraction routines to produce a plurality of minutiae sufficient for an intended purpose. Thus, for a relatively low order of required security, on the order of ten minutiae may be extracted using absolute anatomical positions such as branch locations of the carotid and facial arteries.
For a high security requirement, on the order of 100 derived minutiae may be extracted using additional computations to identify further derived and absolute minutiae. The minutiae extraction and characterization procedure locates the position of each minutia. In addition it may note characteristics of each point such as: a vector indicating the orientation of the corresponding blood vessel; a second vector indicating the relative orientation of the branching blood vessel; the normalized apparent temperature; and the apparent width of the corresponding blood vessels. As with some fingerprint minutiae matching machines, use of the characteristics can enhance the speed and accuracy of identification. Furthermore, it can improve the accuracy and speed of automatic fusion of medical imagery.
This basic technique can be employed on an area-by-area basis when portions of the body cannot be seen or when significant changes have occurred in portions of the thermogram such as when portions of the body have suffered external wounds. This would be done by segmenting the thermogram to consider only the portions of the body in which minutiae can be detected. Functionally this is equivalent to matching a latent partial fingerprint found at a crime scene to a full rolled print filed in the FBI system. The set of minutiae points, together with characteristics which describe each such point and its relation to other minutiae is considered unique to the individual and persistent, for both contact fingerprints and thermal minutiae.
Verification that two images from different spectra may be from the same person can be an end goal in itself or the first step in further processing the two images to extract comparison data.
A change in facial expression or the action of speech causes movements in affected areas of the face, particularly the lips, but also the eye, chin, forehead, and cheek areas. Encoding of facial expressions and facial movements during speech is currently being studied for bandwidth reduction in the transmission of xe2x80x9ctalking headxe2x80x9d video for applications such as videophone, videoconferencing, video email, synthetic speech, and face animation. The intent is to transmit a baseline image followed by encoded changes to that image, with reconstruction of the animated face at the receiving end. This process offers significant bandwidth reduction, but may produce imagery in which the talking face seems stiff and unnatural or does not appear to be synchronized with the audio, giving the unacceptable look of a dubbed foreign film.
All such studies involve modeling the facial movements based upon the relocation of certain observable points of the face, such as the corners of the mouth. The various models differ in the extent to which they consider the underlying facial muscles and nerves. There are few observable reference points on a generalized face, especially under uncontrolled lighting conditions. In particular, there are no observable reference points in the cheek areas, and none in the forehead area except possibly skin creases. When the talking head is that of a dark skinned person, the reconstructed image may show further degradation of subtle facial features.
Use of an IR camera in conjunction with a video camera, or use of a dualband camera at the transmission end offers the potential for marked improvements. Infrared minutiae are more numerous than visible markers and are present throughout the face, including areas of the cheeks and forehead and chin where no visible minutiae may be present. Therefore, modeling of the movements of infrared minutiae can provide finer detailed replication of expressions and speech than modeling based upon visual references.
At the transmitting end, a visual baseline image of the subject face is sent, followed by transmission of only the movement vectors of those infrared minutiae which move from frame to frame. At the receiving end, the baseline face is animated based upon overlaying the IR minutiae movements on the visual image.
Early results indicate a minimum of 150:1 compression for highly energetic faces, to 400:1 for mildly mobile faces when 30 frames per second are processed. A primary application for this technique is videoconferencing, where the goal is to provide acceptable quality imagery over dial-up lines, at acceptable cost.
Video e-mail and videophone could also utilize the significant bandwidth reduction and automated re-synchronization of voice and image.
By processing sequences of images taken from known expressions and/or known speech elements, a sequence of movements of infrared minutiae can be extracted which corresponds to that expression or speech element for that person or for persons in general. Subsequently, when the same sequence of movements of infrared minutiae is seen, it can be inferred that the person is displaying the same expression or speech element as during the initial sequence. This enables the automated determination of expression or speech, allowing for compression of transmitted video in conjunction with audio. The combination may offer additional composite compression and improved synchronization.
The same basic technique can also be used to create a dictionary of facial expressions and speech elements for use in animation of a synthetic face.
The talking head video compression system will have both video and IR cameras, and can be used to recognize and/or generate facial expressions and/or speech-related facial movements from the IR image and superimpose them on a contemporaneous visual image. The use of correlated infrared and video facial images offers significantly better fidelity of expression and speech-related variations in compression and reconstruction of talking head video, while also ensuring the authenticity of the related transmissions.