1. Field of the Invention
The present invention relates generally to systems and methods for automatically describing human speech, and more particularly to systems and methods for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing human/animate speech.
2. Discussion of Background Art
Sound characterization, simulation, and noise removal relating to human speech is a very important ongoing field of research and commercial practice. Use of EM sensors and acoustic microphones for purposes of human speech characterization has been described in the referenced application, Ser. No. 08/597,596 to the U.S. patent office, which is incorporated herein by reference. Said patent application describes methods by which EM sensors can measure positions versus time of human speech articulators, along with substantially simultaneous measured acoustic speech signals for purposes of more accurately characterizing each segment of human speech. Furthermore, the said patent application describes valuable applications of said EM sensor and acoustic methods for purposes of improved speech recognition, coding, speaker verification, and other applications.
A second related U.S. patent issued on Mar. 17, 1998 as U.S. Pat. No. 5,729,694, titled “Speech Coding, Reconstruction and Recognition Using Acoustics and Electromagnetic Waves,” by J. F. Holzrichter and L. C. Ng is also incorporated herein by reference. Patent '694 describes methods by which speech excitation functions of human (or similar animate objects) are characterized using EM sensors, and the substantially simultaneously acoustic speech signal is then characterized using generalized signal processing technique. The excitation characterizations described in '694, as well as in application Ser. No. 08/597,596, rely on associating experimental measurements of glottal tissue interface motions with models to determine an air pressure or airflow excitation function. The measured glottal tissue interfaces include vocal folds, related muscles, tendons, cartilage, as well as, sections of a windpipe (e.g. glottal region) directly below and above the vocal folds.
The described procedures in application Ser. No. 08/597,596, enable new and valuable methods for characterizing the substantially simultaneously measured acoustic speech signal, by using the non-acoustic EM signals from the articulators and acoustic structures as additional information. Those procedures use the excitation information, other articulator information, mathematical transforms, and other numerical methods, and describes the formation of feature vectors of information that numerically describe each speech unit, over each defined time frame using the combined information. This characterizing speech information is then related to methods and systems, described in said patents and applications, for improving speech application technologies such as speech recognition, speech coding, speech compression, synthesis, and many others.
Another important patent application that is herein incorporated by reference is U.S. patent Ser. No. 09/205,159 entitled “System and Method for Characterizing, Synthesizing, and/or Canceling Out Acoustic Signals From Inanimate Sound Sources,” filed on Dec. 2, 1998 by G. C. Burnett, J. F. Holzrichter, and L. C. Ng. This invention application relates generally to systems and methods for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources, and more particularly for using electromagnetic and acoustic sensors to perform such tasks.
Existing acoustic speech recognition systems suffer from inadequate information for recognizing words and sentences with high probability. The performance of such systems also drops rapidly when noise from machines, other speakers, echoes, airflow, and other sources are present.
In response to the concerns discussed above, what is needed is a system and method for automated human speech that overcomes the problems of the prior art. The inventions herein describe systems and methods to improve speech recognition and other related speech technologies.