Pathological speech affects millions of Americans. It is characterized by variable speech rate, segmental production errors, including reduced vowel articulation space, reduced respiratory function for speaking, voice problems (hoarseness, breathiness, tremulousness, amplitude perturbation/shimmer, irregular pitch/jitter), reduced prosodic variability, and reduced amplitude/loudness.
Management and treatment of pathological speech typically includes face-to-face therapy from a speech-language pathologist or a similar clinician. Many with pathological speech issues do not receive treatment due to cost or other barriers to services. As a result, reduction in quality of life have been well-documented, affecting the ability to maintain and be involved in conversations and limiting the ability to communicate wants and needs. This leads to restricted social and personal lives contributing to depression, reduced access to healthcare and healthcare professionals, and reduced access to independent living especially for seniors.
Regaining vocal presence has been shown to be pivotal in regaining the ability to effectively communicate in a healthy lifestyle. There are two principal problems with the current state of treatment for pathological voice problems. First, there is a poor understanding of the natural, systematic mechanisms underlying speech production in situ. Scientific knowledge of vocal pathology is almost entirely based on clinical and anecdotal descriptions. Modern (clinical) science is ever more firmly rooted in empirical evidence as the basis for intervention. That empirical basis is currently unknown.
Second, intervention or therapy (regardless of whether it addresses an accurate description of the problem) has many barriers to real-world application including financial burden, availability of technology, ease of use, demonstrated real-time and generalized effectiveness, and ecological portability (here, wearability).
Known devices for training voice patterns include those disclosed in US Patent Publication No. 20140330557 A1; and U.S. Pat. No. 9,381,110B2. Prior devices rely on an accelerometer and a manually set threshold range to indicate vocal activity. Sensor data from an accelerometer is sent to a preamplifier and a bandpass filter to apply a gain of 2000 and limit the frequency output of the sensor data to a specific frequency range. The modified data is then sent to a comparator with manually preset reference levels. The manually preset reference levels are utilized to identify the start of vocalization (when the sensor data exceeds a reference level) and to identify pathological voice problems, in order to send a feedback signal, when the sensor data drops below another reference level. By relying on a single sensor type and preset thresholds, these devices may trigger a feedback response for certain non-vocal inputs and may be unable to adjust for calibration issues commonly associated with the use of a particular sensor (e.g., accelerometer drift).
Prior art devices initiate feedback that is an audible alert adjusted for the ambient acoustic noise by a set dB value to induce a Lombard effect response on the patient. This may prove inadequate for treating patient's suffering from partial hearing loss as response to the Lombard effect would be reduced.
Other relevant background art is as follows:    a. Hess, W. 1983. Pitch Determination of Speech Signals, Springer Series of information Sciences, vol. 3, Springer-Verlag, Berlin, Heidelberg, New York (1983)    b. Ishi, C. T., Ishiguro, H., Hagita, N. 2005. Proposal of acoustic measures for automatic detection of vocal fry. In: Proceedings of Interspeech 2005, Lisbon, Portugal, pp. 481-484.    c. Schädler, M. R., Meyer, B. T. and Kollmeier, B., 2012. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. The Journal of the Acoustical Society of America, 131(5), pp. 4134-4151.    d. Young, V. and Mihailidis, A., 2010. Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assistive Technology, 22(2), pp. 99-112.    e. Van Doremalen, J., Cucchiarini, C. and Strik, H., 2009. Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing, 2010(1), p. 1.    f. Junqua, J. C. and Haton, J. P., 2012. Robustness in automatic speech recognition: Fundamentals and applications (Vol. 341). Springer Science & Business Media.    g. Sharma, P. and Rajpoot, A. K., 2013. Automatic identification of silence, unvoiced and voiced chunks in speech. Journal of Computer Science & Information Technology (CS & IT), 3(5), pp. 87-96.