1. Field of the Invention
The present invention is directed to the acquisition of facial features, in particular for visual speech processing.
2. Description of the Related Art
Human speech is discerned by acoustic processing. In addition, while acoustically processing speech the listener is simultaneously observing relevant visual information from the speaker's mouth and lips and performing complementary visual speech processing by speechreading or lipreading. Traditionally, acoustic speech recognition (ASR) systems have ignored, the visual component and concentrated exclusively on the acoustic component of speech. Because the visual component was ignored conventional ASR systems tended to perform poorly in less than ideal environments with high acoustic noise or multiple speakers. In addition, acoustic system performance is highly dependent on the particular microphone type and its placement; however, people typically find head-mounted microphones uncomfortable during extended use and impractical in many situations.
Recently, ASR system research has begun to focus on the advantages of incorporating visual speech processing technology to provide visual information for improving concentrated on developing an efficient method for extracting visual parameters, most notably the size and shape of the lips. Real-time processing of a full image to reliably identify facial features is difficult. Consequently, heretofore methods of identifying visual speech information have incorporated artificial means of tracking a particular facial feature, e.g. lip markers or patterned illumination, to readily recognize and track only the lips. These methods fail to provide practical applications without facial parameters and fixed lighting or requiring that the subject wear markers on the face.
By far the easiest feature of a face to identify and track accurately is the nostrils. The nostrils may be readily identified because they are located in the middle of the face, they are darker than the darkest skin color and they are rarely obscured by facial hair. Previous systems have recognized and used nostril tracking as an efficient and accurate means for locating and tracking the lips, such as the electronic facial tracking and detection system and method and apparatus for automated speech recognition disclosed in U.S. Pat. No. 4,975,960. This patented system is based upon the process of defining a first window around the nostrils and a second window around the mouth region in relation to the first window. Mouth movements during speech are tracked and matched to templates stored in memory. A significant disadvantage associated with this tracking and detection system is that it requires specific lighting arrangements, as, for example, shown in FIG. 7 of the patent, and thus has limited practical application. Moreover, the tracking and detection system described in that U.S. Pat. No. 4,975,960 uses non-adaptive thresholding, thereby rendering the system is sensitive to lighting variations.
Thus, it is desirable to develop a method for extracting visual information for practical applications suitable for use in a variety of viewing conditions, such as varied lighting conditions, head motion and variations between speakers, while incorporating the computational efficiency of nostril tracking.