1. Field of the Invention
The invention relates to a method and device for the diagnosis and treatment of speech disorders and more particularly to a device and method for providing biofeedback on the level of nasalization of voiced speech.
2. Description of the Related Technology
A. Velar Control and Oronasal Valving in Speech
During speech or singing, it is necessary to open and close the passageway connecting the oral pharynx with the nasal pharynx, depending on the specific speech sounds to be produced. This is accomplished by lowering and raising, respectively, the soft palate, or velum. Raising the velum puts it in contact with the posterior pharyngeal wall, to close the opening to the posterior nasal airflow passageway.
This oronasal (or velopharyngeal, as it is usually referred to in medical literature) passageway must be opened when producing nasal consonants, such as /m/or /n/ in English, and is generally closed when producing consonants that require a pressure buildup in the oral cavity, such as /p/, /b/ or /s/. During vowels and vowel-like sonorant consonants (such as /l/ or /r/ in English), the oronasal passageway must be closed or almost closed for a clear sound to be produced. (Though some languages, such as French, include some vowels that are properly pronounced with nasalization, as the first vowels in the words “francais” and “manger”. In addition, vowels adjoining a nasal consonant are most often produced with some degree of nasalization.)
There are many disorders that result in inappropriate oronasal valving, usually in the form of a failure to sufficiently close the oronasal passageway during non-nasal consonants or non-nasalized vowels. Such disorders include cleft palate and repairs of a cleft palate, hearing loss sufficient to make the nasality of a vowel not perceptible to the speaker, and many neurological and developmental disorders.
The effect on speech production of insufficient oronasal closure is usually separated into two effects, namely, the nasal escape of pressurized oral air, termed ‘nasal emission’, that limits oral pressure buildup in those speech sounds requiring an appreciable oral pressure buildup (as /p/, /b/, /s/ or /z/), and, secondly, the incomplete velar closure during vowels and sonorant consonants that is often referred to as ‘nasalization’ (Baken and Orlikoff, 2000). The terminology used here is that suggested by Baken and Orlikoff, who also prefer to reserve the term ‘nasality’ for the resulting perceived quality of the voice.
It is well-documented, and easy for even a lay person to hear, that a person who is severely hearing impaired from a time that precedes the learning of spoken language generally learns to speak with an abnormally high degree of nasality. This nasality is primarily due to the nasalization of vowel-like speech sounds. It is commonly associated with ‘deaf speech’ and acts to impede the comprehension of such speech (Stevens, et al., 1976; Baken and Orlikoff, supra)). Such abnormal nasalization stems from at least three factors: first, that the acoustic effects of improper velar action cannot be perceived by persons with a strong hearing impairment, second, that since the action of the velum is not easily observed visually, velar action cannot be mimicked by visual observation (as might be motions of the lips, for example), and third, that there is little proprioceptive feedback for velar action to aid in learning (Stevens, et al., supra). As a result, there is a need for convenient and reliable systems to provide an alternate means of feedback for the hearing-impaired person trying to learn or improve velar control.
It is also well documented that nasality also is important in the speech of persons with a cleft palate. In a summary article, Spreisterbach (1965) concludes, “Clearly, articulation errors and nasality are the two most frequent and significant communicative problems of speakers with cleft palates. Furthermore they are related.” He also concludes that: “Velopharyngeal incompetence is undoubtedly the principle factor in accounting for the articulation errors and the nasality.”
B. Previous Methods for Providing Biofeedback for the Control of Nasality
Early speech training methods for the hearing impaired are summarized by Baken and Orlikoff (supra), and range from using a fingertip on the side of the nose to detect sound passing through the nose to electronic devices that picked up such nasal vibration with a vibration sensor (microphone or accelerometer) held against the side of the nose, with visual feedback provided to the user by means of a meter, oscilloscope or computer screen (Stevens, et al., 1976). Though yielding some information, such methods work poorly for women and children, whose normal voice pitch is too high to stimulate significant vibration of the surface of the nose or to be picked up readily by the tactile sense. Thus, though a gross indication of nasalization could be obtained for a held vowel spoken loudly by an adult male speaker, methods based on vibration of the surface of the nose activating a visual display yield results highly dependent on sensor placement, facial anatomy, voice pitch and loudness and speech content. As a result, such methods are not reliable enough to be used for self-monitored real-time biofeedback by a variety of speakers during continuous speech.
The development of digital computers capable of processing speech-like signals in real time, and displaying the results of an analysis, brought more sophisticated visual displays for biofeedback. The more successful of these were displays of ‘nasalance’, where the term nasalance refers to a measure of the ratio of nasally emitted acoustic energy to orally emitted energy. (see, e.g., U.S. Pat. No. 3,752,929)
A visual nasalance display can be a convenient and reliable measure of nasalization for non-real-time analysis and comparison, and can provide real-time biofeedback for a held or prolonged vowel or consonant, or perhaps for unnaturally slow speech. However, during natural speech, the visual sense cannot provide real-time feedback of the time pattern of nasalance as it changes. This is due to the fact that time-sequential, spatially overlapping visual patterns tend to erase previous patterns in the visual short-term memory, in a process that is referred to as ‘visual masking’ (Breitmeyer, 2007). (It is for this reason that one cannot read if the letters in the message are presented time-sequentially in the same location in the visual space. Reading is made possible by spreading the letters spatially.) In addition, visual displays in general take a hearing impaired user's eyes from the task of speech reading.
Tactile stimulation has long been considered as a modality for encoding speech, usually in the form of vibration, though sometimes in the form of electrocutaneous stimulation. Methods considered have ranged from arrays of vibrators or electrical contactors, each encoding the energy in a different band of frequencies, to varying the amplitude, waveform or frequency of a stimulus at one location (Reed, et al., 1982; Rothenberg and Molitor, 1979)
An array of vibrators or contactors has proven only marginally successful for encoding speech parameters, probably because there is no natural connection between movement over the surface of the body and contrasting speech parameters (Reed, et al., supra). Thus an array approach would require a lengthy learning process to communicate information meaningfully. The use of the amplitude, waveform or frequency of vibrotactile stimulation at a single location for conveying information has been studied extensively. It is well known that the hearing-impaired can detect rhythmic patterns by putting their hands on a musical instrument or loudspeaker, and voice-related vibration can often be detected by placing the fingertips on the face or neck of the speaker. In addition, this type of stimulation is now used successfully in cell phones and pagers for alerting the user, and has been suggested as a signaling modality for at least one biofeedback application (U.S. Pat. No. 6,384,729). However the use for conveying more complex forms of speech information is more problematic (Rothenberg, et al., 1977) due to the limited information processing capacity (channel capacity in information theoretic terms) of the skin. One attempt to limit the amount of speech information transmitted to the skin to keep the information within the channel capacity of the tactile sense, by encoding only the voice pitch and reducing the pitch information to the frequency range detectable by the skin, was partially successful (Rothenberg and Molitor, 1979). However, a problematic limitation of approximately 200 ms was found in the time resolution of the tactile sense. This limitation, and other sensory limitations on the use of vibration frequency as a sensory modality, restricted the success in vibrotactile encoding of voice pitch.
As discussed above, improvement in the control of nasality in speech is quite important in a number of cases, most especially for many persons who are hearing impaired and for those with a cleft palate. However, a means for providing biofeedback sufficient for enabling a user to improve his or her velopharyngeal valving has to-date eluded researchers and other practitioners. Embodiments of the present invention address this need.
C. References Cited
The following references are representative of the background of the invention and are incorporated herein in their entireties.
U.S. Patent Documents3,752,929August, 1973Fletcher (Process and Apparatus forDetermining the Degree of Nasality)6,850,882February, 2005Rothenberg (System for measuring velarfunction during speech)6,974,424December, 2005Fletcher (Palatometer and nasometerapparatus)6,384,729May, 2002Plotkin (Biofeedback exercise stimulationapparatus)