Computerized assessment systems may be used to determine how well a human subject performs on a specific task or how a subject's function compares with that of normal subjects. Such assessment can be used to determine eligibility for medical benefits, to guide medical treatment, and to guide training in areas of human performance, such as voice and speech production, fitness and sports. A computerized system is advantageous for assessment because it can facilitate the measurement process and the management of data.
Although it would be clearly desirable to do so, previously known voice and speech assessment systems have not been able to fully automate the measurement analysis process. Products such as the Multi-Dimensional Voice Program (MDVP) marketed by KayPENTAX (a division of PENTAX Corporation of Japan), and Praat, developed by the Institute of Phonetic Sciences at the University of Amsterdam, compute clinical voice quality metrics, such as jitter and shimmer, on sound samples provided to them. However, they do not automatically select an optimal segment of a sound sample to use for metric computation. Such sound samples are obtained from an analog to digital converter that digitizes voiced sound produced by a subject. The subject may be instructed, for example, to voice an “ah” sound for several seconds. The digital samples produced by the analog to digital converter estimate the acoustic pressure (amplitude) of the resulting sound wave as it varies over time. A corresponding sound pressure level (SPL) can then be derived from root mean square of the samples calculated over short time intervals of the digital samples.
The quality and diagnostic utility of measurements made on the sound sample are very dependent on the quality of the digital samples used to make them. It is important to use a stable portion of a vowel sound to make a measurement, and also to avoid transients and locations where the voice is “strained”, as described in papers such as Recasens, D. (1999), “Acoustic analysis” in W. J. Hardcastle & N. Hewlett, “Coarticulation: Theory, Data and Techniques”, pp. 322-336, Cambridge University Press, UK and in Kent, R. D., Vorperian, H. K., Kent, J. F., & Duffy, J. R. (2003), Journal of Communication Disorders, 36, 281-306, which are hereby incorporated by reference. If the signal to noise ratio is not sufficiently high, the results may not be useful. If the analog to digital converter clips too high a percentage of the values used to make measurements or there are voice breaks in the sample so that too little of the sound sample has an estimable frequency, then the measurements may also not be useful. Other key factors limiting the quality and usefulness of measurements are excessive variance in the sound pressure levels and in the fundamental frequency. While existing products provide mechanisms for a diagnostician to assess such issues, they require the diagnostician to make an assessment and choose a segment of a sound sample to be used for analysis. Automation is very desirable for many reasons, including increasing the ease of use of the assessment system, and improving repeatability and compliance with analysis guidelines.
Various methods have been disclosed in the prior art to automatically segment speech or sound into a sequence of segments, where each segment has relatively consistent properties, such as in U.S. Pat. Nos. 6,907,367 and 6,208,967. However such segmentation is driven by the need to break up the speech for purposes of recognition and is generally not suitable to identify segments within a fairly uniform sample that are optimal for measurement of metrics.
Also, previously known voice and speech assessment systems have not been able to calibrate the absolute sound pressure level or to provide feedback to the user as to whether the subject's sound pressure level is sufficiently above the noise level to obtain an accurate measurement. It would therefore also be desirable to provide calibration functions accessible from within an assessment system.