Every year in the United States alone, thousands of people lose their vocal cords because of laryngeal cancer or trauma. For many of these people (i.e., laryngectomees), the only option for regaining a speech capability is through the use of an electro-larynx (E-L), which is a handheld battery-operated shaker or vibrator that is pressed against a predetermined area of the throat to produce a speech-like sound and pattern. The electro-larynxes of the prior art are devices having non-linear transducers which produce speech that is very machine-like in sound, with low levels of loudness and intelligibility. This relatively poor speech sound quality often draws undesired attention to the user and can result in strained, unnatural communication with others.
FIG. 1A shows a partial cross sectional profile view of a human 10 with a normally structured larynx 12, including vocal cords 14, and vocal tract 16. The vocal tract 16 includes the pharynx, tongue, mouth and lips of the person. To form speech, air is forced through the larynx by the lungs and simultaneously, in response to signals from the laryngeal nerve (not shown), the vocal cords 14 are selectively tensioned so that the airflow causes the vocal cord to vibrate to create sound waves, referred to as glottal source waves and their form is referred to as a “glottal source waveform”. The glottal source waves are modulated by the vocal tract to form speech emitted from the mouth, as depicted by arrows 18. In the case of a laryngectomee, shown in FIG. 1B, air is drawn into the lungs (not shown) via an opening 32 in the trachea 34, as depicted by arrows 38a. Air is then forced out of the lungs and exits opening 32 in trachea 34, as depicted by arrows 38b. Therefore, the air flow never passes through the vocal cords (which have been removed) or the vocal tract 36. Consequently, the airflow from the lungs cannot create glottal source waves and the vocal tract remains idle with regard to the creation of speech.
The possibilities for creating speech without the assistance of an electro-larynx or similar device are few and are commonly considered inadequate. For example, one such process for creating speech without an electro-larynx is called “esophageal speech”. According to this process, a person swallows air (that is, draws air through the mouth into the esophagus), and then regurgitates it through the vocal tract for modulation. This process produces poor quality speech and is generally cumbersome and embarrassing.
Assisted speech using an electro-larynx is typically preferred over the above methods for producing speech by laryngectomees. In FIG. 1C, a person 50 is shown using a prior art electro-larynx 100. Electro-larynx 100 is pressed against an area of the throat 54 and produces sound waves which are propagated through the tissue of the throat to the vocal tract 56. The waveform entering the vocal tract is an approximation of a glottal source waveform. Vocal tract 56 then modulates the received waveform to form speech, depicted by arrow 58, much the way the vocal tract would modulate glottal source waves supplied by the vocal cords, if they were present.
A partial diagrammatic view of prior art electro-larynx 100 is shown in FIG. 2. The prior art electro-larynx includes a non-linear transducer 210, a power amplifier 250, and a waveform generator 260. The transducer and waveform generator are the heart of the electro-larynx 100 and predominantly dictate the quality of speech that can be produced using the electro-larynx 100. The waveform generator produces a base waveform at the desired fundamental frequency (typically through use of pulsed waveforms), and the power amplifier provides a high output current that drives the transducer. The transducer converts electrical energy into sound waves. Ideally, the waveform output from the tissue against which the electro-larynx is pressed and delivered to the vocal tract, is identical to the glottal source waveform that would be produced by the vocal cords and delivered to the vocal tract. However, due to limitations in prior art non-linear transducers and electro-larynx waveform generators, only rough approximations of the glottal source waveform are possible.
The physical make up and mechanical characteristics of non-linear transducers used in conventional electro-larynxes compromise the output signal of the electro-larynx. For example, one significant limitation of such an electro-larynx is that there is little control over the achievable speech quality due to the non-linear nature of the transducer. Only the fundamental frequency is controlled by the waveform generator; the spectrum of the resulting sound (reflecting more of the harmonics than the fundamental frequency) is a complex function of the mechanical structure of the transducer, and is not controlled. Furthermore, the mechanical characteristics of the non-linear transducer add spectral limitations to the electro-larynx that often results in a low frequency deficit below approximately 500 Hz, which makes certain vowels hard to distinguish.
The illustrated prior art non-linear transducer 210 of FIG. 2 is generally cylindrical, extending along a principal axis-X. A motor assembly 220 is made of a combination of steel and magnetic materials, typically layered, that form a cylindrical void region extending along the X axis, within which a strong radial magnetic field is created. An armature assembly 224 is disposed within the cylindrical gap and consists of a wire voice coil 212 that is wrapped around a bobbin 214, which is attached to an axially-extending rigid striker 218. Bobbin 214 is supported to permit vibratory axial motion (along the X-axis) by a suspension assembly 216. A coupler disk 222 is dispensed at one end of housing 220, within striking range of striker 218. By appropriate application of electrical current to voice coil 212, operating within the magnetic field of motor housing 220, bobbin 214 is caused to axially pulsate. As a result, the armature assembly 224 vibrates periodically at a pitch frequency, which is a function of the current applied to wire 212 and the mechanical characteristics of the transducer components. As armature assembly 224 (supported by suspension assembly 216) vibrates, striker 218 strikes coupler disk 222 and the coupler disk vibrates in response to being struck. As shown in FIG. 2, an external surface A of coupler disk 222 is pressed against the user's throat. As coupler disk 222 vibrates, it couples its vibratory motion to the throat, which in-turn creates acoustic waves at the base of the vocal tract. As modulated by the vocal tract, these acoustic waves emerge as speech from the lips of the user.
The striking action of the armature striker against the coupler disk creates sound with a pressure waveform in the form of an impulse train. The spectrum of this pressure waveform is a function of the mechanical properties of the coupler disk and its mounting to the electro-larynx housing. The coupler-striker interaction is more efficient at producing high frequency sound than it is at producing low frequency sound. Thus, the output spectrum of an electro-larynx having a non-linear transducer is inherently more narrow than the spectrum needed to create natural sounding speech. Also, a relatively high level of noise is generated by the transducer due to the striking of the armature against the coupler disk. This noise becomes constant interference to the desired signal by filling in spectral and temporal valleys where sound should be absent.
Waveform generators typically used in electro-larynxes are inherently limited. For example, a typical electro-larynx waveform generator produces a simple periodic (e.g., sinusoid) waveform having a single fundamental frequency. Such a system produces unnatural, monotone speech due to the simplified waveform and the non-linear nature of the transducer. Often such an electro-larynx includes an embedded control (e.g., potentiometer) with which a user may select a fundamental frequency, within a certain predetermined range of frequencies. However, monotone speech is always produced when the electro-larynx is in use.
In another electro-larynx, the frequency is user variable during operation, within a predetermined range, but the waveform is still of a simple shape. In such a case, the frequency is controlled by a pressure sensitive finger control, wherein a change in the pressure exerted on the finger control produces a corresponding change in the frequency of the output wave (and resulting speech). While this ability to change the frequency during operation is useful, it is substantially impossible for a user to produce a wave having the irregular harmonic characteristics needed to approximate that of normal human speech, and the sound quality is still highly machine-like and mechanical.
Accordingly, it is an object of the present invention to provide an electro-larynx system which delivers an improved glottal source waveform to the vocal tract of a user to produce improved, more natural sounding speech.