Traditionally, visual-information based face recognition methods have relied on using cues derived from static spatial relationships of facial features such as eyes, ears, nose, and mouth taken from one dimensional profiles or two-dimensional images. An exhaustive list of spatial features for face recognition can be found in Ashok Samal et al., "Automatic Recognition and Analysis of Faces and Facial Expressions: A Survey," Pattern Recognition, Vol., 25, No. 1, pp. 65-77 (1992). The input data is usually obtained from a single "snapshot."
Another example of a facial recognition system is described by Peter Tal, U.S. Pat. No. 4,975,969, for a "Method and Apparatus for Uniquely Identifying Individuals by Particular Physical Characteristics and Security System Utilizing the Same," in which static distances between identifiable points on human faces can be used to identify an individual.
FIG. 1 shows the key facial parameters used by Tal that include: the distance between eye retina centers (LER); the distance between the left eye retina center and the mouth center (LEM); the distance between both retina centers and the nose bottom (LEN and REN); and the distance between the mouth center and the nose bottom (DMN). In addition various ratios of these static distance features are formed for scale normalization.
The attraction of using a single still-image, as opposed to using multiple still images, or a video recording, is at least two-fold:
(1) a single still image is less demanding of memory storage requirements; and PA1 (2) human observers can recognize faces when presented with a single snapshot with little evidence that recognition is improved by using a video recording of the speaker.
The prior art, using strictly static facial features, performs facial recognition by "seeing." The present invention uses a space and time (spatiotemporal) representations of dynamic speech related facial features for identification of a speaker. The method only uses visible observations (no acoustic data). This is face recognition by seeing and visual hearing.