The present invention generally relates to the field of speech and language impairments, speech therapy, and augmentative and alternative communication (AAC) interfaces. Embodiments of the present invention relate to an apparatus for real-time wireless tracking of a subject's tongue during speech by systems and methods of using the apparatus. More particularly, embodiments of the present invention relate to an apparatus for real-time tracking of a subject's tongue during speech by way of a tracer unit carried by the tongue of a subject and a plurality of sensors for detecting at least one position and/or orientation of the tracer in real time, such that movements of the tongue by the subject during speech may be continuously tracked and displayed directly for review or embedded indirectly in a game-like environment to further motivate the subject to improve the subject's speech.
Embodiments of the present invention can comprise a multi-functional, scalable tongue tracking system (TTS) that can continuously and wirelessly track the tongue's position in real time and use that information, for example and not limitation, to measure, guide, create, and improve speech. The TTS can provide the user with, for example, audiovisual biofeedback through displaying three dimensional (3D) tongue movements or derivatives of such movements as well as audible speech/non-speech waveforms, or tactile vibrations on the skin, enabling fluid real-time interaction between the end-user (patient), the therapist, and the system. The TTS can also be used as a silent speech interface by using the acoustic-kinematic recordings of a subject who may have a voice disorder, for example, to automatically build, index, and reproduce speech from a database of phonemes, words, phrases, and commands based on that user's specific acoustic-kinematic recordings.
Speech is a complex and intricately timed task that requires the coordination of numerous muscle groups and physiological systems. While most children acquire speech with relative ease, it is one of the most complex of all patterned movements accomplished by humans, and thus susceptible to impairment. Additionally, the ability to communicate is one of the most important requirements of independent living and a key factor in the quality of life.[1] This motivates the development of both novel speech impairment diagnostic and therapy systems, and new alternative and augmentative communication (AAC) methods.
Speech and language therapy has a long history that includes many different technologies and techniques for both experimental and clinical settings. Traditional speech intervention relies on a trained speech and language pathologist (SLP) providing directive feedback to the patient about proper placement of the articulators and manner of articulation.[2] Common elements of speech therapy can include repeated practice, visual feedback via a mirror, and mimicking the clinician's accurate production. While these practices are generally effective for readily viewable speech sounds (visemes, such as /b/p/m/, any of several speech sounds which looks the same on the face when produced), they are often largely unsuccessful for sounds produced inside the mouth. Each viseme can be representative of more than one phoneme, which can be produced by the same mouth posture. The TTS system can allow clinicians to easily demonstrate the subtle differences between phonemes beyond what the viseme presents. Words such as pet, bell, and men can often be difficult to distinguish, as all look alike. However, there can be differences in timing and tongue gesture in terms of representing a visual signature of a given phoneme. A major benefit of the TTS system is to clarify and enhance these visual differences even when they are completely hidden inside the mouth.
The tongue is the primary articulator for these obstructed sounds and its movements within the mouth can be difficult to capture. Thus, clinicians often use diagrams and other low-tech means (i.e., placing edible substances on the palate, physically manipulating the oral articulators, etc.) to show clients where to place their tongues in order to produce obstructed sounds. While sophisticated research tools exist to measure and track tongue movements during speech (e.g., electromagnetic articulography and electropalatography), they can be expensive, obtrusive, and impractical for wide-spread clinical use. Moreover, considering that the therapy sessions fill only a fraction of the patients' weekly or monthly routines, a clinician-supervised means for practicing speech are generally not available to the patients in their home and work environments, where they spend the majority of their time.
For people with voice disorders, who can have either no audible speech to correct (aphonia) such as individuals post-laryngectomy, or very weak or faint speech (dysphonia) due to, for example, old age, there are a limited number of communication modalities available. Currently, there are three possible methods for partially restoring the vocal function, each with a major limitation: A) Oesophageal speech: the sound created by swallowing air and belching. This method can be difficult to learn, and may not allow fluent speech. B) Electrolarynx: vibrates the soft tissues of the throat and creates sound, which can be articulated into speech, but the voice is hoarse and robotic, and can be difficult to understand. C) Voice prosthesis: A small silicone-based “tracheo-oesophageal” fistula speech valve that is currently the “gold standard”. Although these fistula speech valves work very well initially, in many patients, they rapidly become colonized by biofilms and fail after only 3 to 4 months. Various modifications have been tried over the years to discourage biofilm growth, but to date none of them appears to provide a long-term solution for voice replacement.
A new modality that is currently under research in academia is a silent speech interface (SSI). An SSI is a device that collects data from various elements of the human speech production process (e.g., articulators, neural pathways, or the brain) using different types of sensors to detect the intended phonemes and words to produce a digital representation of speech, which can be synthesized into speech.[3] Depending on the individuals' impairments, various types of sensor and data acquisition strategies have been developed, leading to these categories:
1. Electromagnetic Articulography (EMA) sensors that capture the movement of fixed points on the articulators.[4] [5]
2. Ultrasound and optical imaging of the tongue and lips that lead to real-time characterization of the vocal tract.[6]
3. Non-audible murmur microphone for digital transformation of the signals.[7]
4. Electromagnetic and vibration sensors for analysis of glottal activity.[8]
5. Surface electromyography (sEMG) of the articulator muscles or the larynx
6. Electro-encephalography (EEG).[9]
7. Implantation of microelectrode arrays in the speech-motor cortex:[10]
Each of the above methods has its own pros and cons for clinical translation. For instance, (1) and (2) are precise but can be quite bulky and expensive; (3), (5), and (6) may not be robust and reliable enough, and usually require intensive training on the user's part; (4) only works for users who have an intact glottis; and (7) is highly invasive and may not be attractive to the majority of end users.
None of the above systems are optimized for those users who often retain a certain level of volitional tongue motion, but suffer from the absence or weakness of their voices. This group can include, for example and not limitation, individuals who have undergone a laryngectomy, older citizens who require a substantial effort for speaking with normal tone, and people who have paralyzed articulators (e.g., vocal fold paresis/paralysis), yet have retained sufficient tongue motion, e.g. certain types of cerebral palsy, stroke, and early stages of amyotrophic lateral sclerosis (ALS). Other potential users are those who have temporarily or permanently lost their voices due to various reasons, for example and not limitation, infections, gastro-esophageal reflux disease (GERD), laryngopharyngeal reflux, spasmodic dysphonia, abnormal growths due to a virus or cancer, and diseases that paralyze the vocal folds. This group is a major population that can significantly benefit from a wireless and wearable SSI system at a reasonable cost.
Users of the TTS and associated software and systems can be SLP practitioners and speech-language researchers. These groups have distinct requirements that can be met by different TTS implementations. Additionally, there are good indicators that the entire speech pathology industry and associated markets are growing at a rapid pace. As the large “baby-boom” population grows older, there will likely be further increases in instances of health conditions that can cause speech or language impairments, such as stroke and hearing loss. In addition, new federal laws guarantee special education and similar services to all children with disabilities. The increase in awareness of speech and language disorders may correspond to an increase in awareness about effective treatment modalities. Other sources of user demand for such a technology include medical advancements that improve premature infant survival rate, victims of stroke, and traumatic brain injuries (TBI), many of whom need help from SLPs. The number of SLPs in private practice is also rising due to an increase in the use of contract services by hospitals, schools, and nursing care facilities.
What is needed, therefore, is a system that can unobtrusively and continuously track the subject's tongue movement and position in real time during speech and use that information, for example and not limitation, to measure, guide, and create speech. The system should take advantage of a plurality of highly precise sensors to precisely track the tongue's movement and position, and translate that information into a format that can be graphically and easily understood by the user, such as a SLP, speech-language researcher, or subject. For example and not limitation, the graphical representation of the user's tongue movement and position can be audiovisual, and can be used to provide biofeedback to the subject and/or user. Yet another major benefit of the TTS system, thanks to the quantitative and software-defined nature of its outputs, is that it can log, analyze, demonstrate and report the patients' progress in improving their speech over the course of the therapy. This can indicate to the SLP whether a certain type of therapy or exercise is working or no. It can also provide the healthcare providers, insurance companies, and reimbursement mechanisms with a much more accurate and objective measure of their clients' and clinicians' performance. It is to such a system that embodiments of the present invention are primarily directed.