Suicide among adolescents, which for the past 10 years has been the second most frequent cause of death among 12- to 17-year-olds, is a serious issue.
Suicide risk factors may include family history, demographics, mental illness co-morbidities, and nonverbal behavior and cues. There are, however, no standardized approaches for analyzing these nonverbal behaviors, which traditionally include gestures, facial expressions, and voice characteristics.
Several researchers have investigated the correlation between severe depression, suicide, and the characteristics of speech. In one example, the speech of 10 suicidal, 10 depressed, and 10 control subjects was analyzed in great detail. All subjects were males between the ages of 25 and 65. The data for the suicidal subjects were obtained from a large spectrum of recording setups comprising, for example, suicide notes recorded on tape. The other two groups were recorded under more controlled conditions at Vanderbilt University. For each subject the researchers concatenated speech to clips of 30 seconds of uninterrupted speech (i.e., removing pauses larger than 500 ms). Then they analyzed jitter in the voiced parts of the signal as well as glottal flow spectral slope estimates. Both features helped to discern the classes in binary problems with high above-chance accuracies by utilizing simple Gaussian mixture model-based classifiers (e.g., control vs. suicidal 85% correct, depressed vs suicidal 75% correct, control vs. depressed 90% correct). A holdout validation was employed. However, the fact that the recordings were done over such a large variance of recording setups, as acknowledged by the authors themselves, makes it difficult to assess “the accuracy about the extracted speech features and, therefore, the meaningfulness of the classification results.” Nevertheless, the fact that the researchers have analyzed real-world data with speech recorded from subjects shortly before they attempted suicide is remarkable and needs to be acknowledged.
Further, a similar approach was utilized to assess the suicide risk of subjects with the same categories. In another example, spectral density features were again used to classify the three classes in three separate binary problems. The data utilized comprised both interview data and read speech. It seems that in those cases, a cross-validation approach was utilized for which it is not clear if the analysis was entirely speaker-independent, as they claim to have used randomized sets of 75% of the data for training and 25% of the data for testing. The observed accuracies are quite high: control vs. suicidal 90.25%, depressed vs. suicidal 88.5%, and control vs. depressed 92.0%.
Another study involved the analysis of glottal flow features as well as prosodic features for the discrimination of depressed read speech of 15 male (nine controls and six depressed subjects, ages 33-50) and 18 female (nine controls and nine subjects, ages 19-57) speakers. In total, 65 sentences were recorded per speaker. The extracted glottal flow features comprised instances such as the minimal point in glottal derivative, maximum glottal opening, start point of glottal opening, and start point of glottal closing. The prosodic features extracted consist of fundamental frequency, energy, and speaking rate. The classification was performed on a leave-one-observation-out paradigm, which has the disadvantage of rendering the analysis highly speaker-dependent. Hence, strong classification results were observed, well above 85% accuracy for male speakers and above 90% for female speakers.
Improved methods for classifying nonverbal characteristics of human speech are needed for the clinical assessment of suicide risk in human subjects. The systems and methods of the present invention address this need by providing excellent discrimination between suicidal and non-suicidal subjects.