Many applications rely on the ability to identify an object from a representation of the object or to verify whether a representation of an object corresponds to an object as purported. These applications may include authentication systems, image-based search engines, identity systems, and so on. An authentication system attempts to verify that a person who purports to be a certain person is really that certain person. An image-based search engine may try to locate duplicate or similar images to an input image. For example, a user who wants to know information about a flower may provide a picture of the flower to the search engine. The search engine may search a database that maps images of flowers to their names. When an image matching the picture is found, the search engine may provide the corresponding name to the user. As another example, a medical service provider may want to verify that a diagnosis based on a certain image (e.g., x-ray) is consistent with a data store of previous diagnoses based on images. An identity system attempts to identify the person in an image or video or the person from whom a voice sampling was obtained.
An authentication system may authenticate a user using voice recognition techniques, face recognition techniques, or other biometric recognition technique. When using voice recognition, an authentication system compares a previous sampling of a person's voice to a current sampling of a person's voice to determine whether the person who supplied the current sampling is the same person who supplied the previous sampling. Similarly, when using face recognition, an authentication system compares a previous image of a person's face to a current image of a person's face to determine whether previous image and the current image are of the same person. If the persons are the same, then the authentication system verifies that the person who provided the current voice sampling or the person in the current image is the same person who provided the previous voice sampling or who is in the previous image.
Many face recognition techniques have been proposed that attempt to recognize the person whose face is in an image even when the faces have different poses. For example, one pose may be a typical portrait pose and another pose may be with the face turned 45 degrees. Holistic face recognition techniques, such as the principal component analysis (“PCA”) based Eigenface technique, perform well for image sets where pose variation is minimal, but perform poorly when there is a wide range of pose variations. Such techniques may not even perform well with minimal pose variation if there is a wide variation in facial expressions. The facial expression may include smiling, grimacing, surprised, sad, and so on. Some face recognition techniques improve their performance by attempting to normalize the variations in poses. Other face recognition techniques improve their performance by using a video of the face or sequence of images of the face, rather than a single image. However, such techniques are complex both computationally and in storage requirements.
Many voice recognition techniques that attempt to identify the speaker of a voice sampling have also been proposed. Some voice recognition systems are text-independent in the sense that a person can say any sequence of words both when training the recognition system and when providing a voice sampling for recognition. Since a text-independent voice recognition system does not have the corresponding text, it bases its models on a division of the samplings into utterances. Since utterances from different speakers who say different things may sound similar, it can be difficult for a text-independent voice recognition system to correctly identify a speaker. Although text-dependent speaker recognition systems have the advantage of knowing what words the speaker is speaking, such systems typically require larger amounts of training data and are more complex that text-independent speaker recognition systems.