Included with this application is a compact disc named 09641157 which contains five separate files, together which comprise table 1 referenced in this specification. The file names, date of creation on compact disc and file sizes are as follows: Main program file appl 09641,157 Baraff.txt, created Nov. 15, 2002 of size 29.8 KB; Pitch program file appl 09641,157 Baraff.txt, created Nov. 15, 2002 of size 4.11 KB; Synth program file appl 09641,157 Baraff.txt, created Nov. 15, 2002 of size 5.47 KB; LPC program file appl 09641,157 Baraff.txt, created Nov. 15, 2002 of size 1.87 KB; and Vowel program file appl 09641,157 Baraff.txt created Nov. 15, 2002 of size 1.48 KB.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyrights whatsoever.
1. Field of the Invention
This invention relates in general to the field of artificial speech for laryngectomees, (a laryngeally impaired individual). It relates as well to the field of voice analysis and synthesis such as has been used in the field of communications. It also relates to the field of voice instruction and training. It also relates to the field of computer controlled prosthetics, particularly as such involves correction of human speech from a voice impaired individual to enable such individual to create natural sounding speech by creating or reproducing prosody and other natural inflections in a human-voice.
2. Description of Prior Art
There have been attempts in the past to create means to improve impaired speech, particularly from laryngeally impaired individuals. No speech devices to date have been able to capture, in sufficient detail, information about the specific speaker to recreate his/her own voice. Artificial devices to create a simulated glottal pulse with a manual ability to change frequency have been known for many years. One of the more recent devices has utilized a small loudspeaker mounted in the mouth on the laryngectomee typically on a denture. This was described in U.S. Pat. No. 5,326,349 by Baraff. Some devices which vibrate the neck have been fitted with a control to enable the user to change the pitch of the speech manually as described in U.S. Pat. No. 5,812,681 by Griffin. All of these devices have the drawback of sounding very mechanical. Even when a user has manually changed the pitch, the sound has not been close to the natural sound of the human being. In devices without myoelectric control it is still necessary for the user to time the onset and fall of the glottal pulse sound manually. This timing takes practice and corrective feedback is useful in minimizing the training time.
There are a number of reasons that laryngectomees have not been able to use previous devices to their fullest potential. Firstly, even with devices which have built in pitch control, it is extremely difficult to coordinate the fingers to imitate natural speech prosody. The speaker requires a xe2x80x9cgood earxe2x80x9d for speech sound coupled with a very strong desire to spend hours of practicing to gain coordination. Many laryngectomees do not possess either the desire or the skill. Secondly, some of the subtleties of creating true prosody may occur in time scales faster than could be manually controlled.
A number of schemes have been developed to create speech from text. One such process is described in the patent by Sharman, U.S. Pat. No. 5,774,854. Conventional speech systems operate in a sequential manner, hence, they do not create prosody until an entire sentence is divided into elements of speech such as words and phonemes. Most of these schemes rely on pre-programmed templates to create prosody. These schemes using a programmed template would not be useful in a real time creation of speech for the laryngectomee because they require the understanding of the word and context to be applied. Although Sharman refers to xe2x80x9creal-timexe2x80x9d operation, because the text is already present in sentence form, it is not in xe2x80x9creal-timexe2x80x9d with regard to a speech input such as in the present invention. Real-time speech to speech requires that the analysis be completed within 50 milliseconds or less, that is, well before the entire word has even been spoken. Clearly techniques which are based on understanding the word before applying prosody will not be useful to solve this problem.
A further element of the disclosed invention, the ability to simulate emotions in speech, is perhaps suggested in U.S. Pat. No. 5,860,064, which creates emotion in speech output only in a text to speech system. This system again does not operate in real time with regard to a speech to speech function.
Another feature of the present invention is its use for training of speech, insofar as it includes pattern recognition, of real time speech input. A system for recognizing and coding speech is described in the U.S. Pat. No. 5,729,694 by Holzrichter et al. This speech system relies on pre-coding parts of speech including the feature vectors as generated both by classical LPC coefficients and the inclusion of a physical mapping of the vocal tract elements by using electromagnetic radiation. The system disclosed presently does not rely on electromagnetic radiation and includes the ability to pre-program specific lessons as generated by the laryngeally impaired individual in conjunction with his speech pathologist. Other devices found in the prior art have left the control of prosody to the control of the laryngectomee and required a high level of manual dexterity to provide inflection and naturalness. In practice, very few laryngectomees use this capability because the timing and control is too difficult.
The disclosed invention provides natural prosody in real time to the speech of laryngeally impaired people (laryngectomees). The invention provides prosody through the means of software running on a digital signal processor and software program running in real time thereby providing more natural speech than is achievable through any manually controlled system.
In addition to providing prosody, the disclosed system has other capabilities providing increased naturalness including: noise cancellation of sound from a neck vibrator excitation source, feedback control to allow use of a microphone distant from the mouth, aspiration noise to mimic real speech, amplification selectively of consonants over vowels to assist in intelligibility, automatic gain control to allow for movement of the head with respect to the microphone, user selection of mood of speech, volume control, whisper speech, telephone mode, training aids, ability to interface with myoelectric signals to provide automatic hands free starting and stopping control as well as user controlled intonation, and the extraction of voice parameters from a user before laryngeal impairment to recreate the voice.
An automatic gain control system has been provided to regulate the output. The unit provides xe2x80x9cwhisperxe2x80x9d speech by using a white noise excitation instead of the glottal pulse excitation. The unit can be used to change the excitation frequency of the sound source in real time. This is useful in use over the telephone or in a stand alone unit which may be used without the loudspeaker. Training aids using pattern recognition are programmed into the device to allow speech pathologists to provide lessons whereby the user gets feedback as to whether his articulation and time is being done according to instruction. The unit is capable of being adapted to receive myoelectric signals for hands free operation. In addition in the case of laryngeally impaired individuals with the larynx nerve replaced to a neck muscle nerve the myoelectric signal can automatically turn the unit on and off and include user directed intonation. Without the myoelectric attachment the user can select from moods of speech which help express himself depending upon situation. Moods such as relaxed, tense, angry, confident can be generated by selecting various components of the prosody algorithm in combination with the glottal pulse parameters. The algorithm disclosed with the present invention provides a means to determine and reproduce a speakers pitch to best reproduce the original voice and inflections of a speaker such as to make the speech more natural. A computer software program listing is included with this disclosure which teaches one means to carry out the pitch determining algorithm which is taught herein.
It is, therefore, the primary objective of the present invention is to provide intelligible and natural sounding speech for individuals with laryngeal impairment while including the feature of prosody as they speak.
Accordingly, it is an object of this invention to recreate natural prosody without the conscious intervention of the user through use of a computer algorithm to process speech. It is also an object of the disclosed invention to provide for prosody and speech improvement by tapping the nerve signal generated in the larynx nerve which controls the larynx in normal speakers to that a signal can be provided for stopping and starting speech. It is also the object of the invention to utilize the same signal to provide information as to the larynx tension, which relates to the pitch of speech, such that the speakers intent can be realized by utilization of the myoelectric signal to process speech.
A second object of the invention is to recreate speech sounding as much like the original voice of the speaker as possible by applying algorithms which duplicate the frequency range, the rise and fall times and other characteristics of the speaker in the original speech and comparing them with the rise and fall times of speech created using an artificial glottal pulse, utilizing a digital signal processor to correct for the difference to create speech similar to the speaker""s original voice.
A third objective of the invention is to provide feedback to the user as to how well he/she is doing in learning some of the fundamentals of how to make the speech device sound clearer by using pattern recognition such that useful information in the form of instruction can be provided for the user.
It is also an object of the invention to allow the user to change the mood of his speech through various algorithms which signal calmness, levity, anger, friendship, command etc., by altering setting of the disclosed prosody algorithm.
A further object of the invention is to recreate the natural voice of an individual which existed prior to laryngeal damage or removal.