Physiological indicators of psychological stress and biofeedback are employed by virtually all health care disciplines, spanning such diverse areas as psychology, psychophysiology, psychiatry and many subspecialties of medicine, dentistry and the behavioral sciences. Psychological stress is a part of healthy human growth yet is implicated in many physical and mental disorders. What may overwhelm the resources in one person may be within the resources of another person who is capable of coping with such stress. What may distress one person may be an exciting challenge to another. What may be within one person's capacities, in a particular situation and moment, may overstrain another person.
Psychological stress is conceptually defined as a state of psychological strain, from external or internal sources, which imposes demands or adjustments upon an individual that are appraised by the individual as being excessive to available resources and endangering the individual's personal well-being such that some breakdown of organized functioning occurs. One common way of measuring psychological stress is through physiological indicators. A primary class of such indicators is the psychophysiological responses of the autonomic nervous system (ANS). In general, measurements of end organ responses are used as physiological indicators. For example, commonly measured physiological indicators include the electrical activity of the skin, heart rate, heart rate variability, blood pressure, blood volume pulse, finger temperature, respiration, muscle tension, is brain wave activity and the like.
The current, most common modalities of biofeedback instruments monitor the measurement of muscle tension, skin temperature, electrical properties of the skin, respiration, heart rate related measurements and various brain wave activities. Many modalities for measuring psychological stress, including the aforementioned common modalities, involve devices that reflect either arousal in the ANS or arousal in other biological processes.
The measurement of the sound in a human speech sample is another physiological indicator measured by biofeedback and psychological stress instruments. Sound in the human voice is initially a product of the vibration of vocal "cords" or folds in the larynx. Vocal fold vibrations result from partially closing the glottis so that air is forced through the glottis by contraction of the lung cavity.
The term vocal "cords" is imprecise. In actuality, vocal "cords" consist of lips or folds of muscle, the thyro-arytenoid and an elastic ligament placed symmetrically to the left and right of the median line of the larynx. The vocal folds are attached at one end to an inner projection of two small cartilages, the arytenoids, and at the other end to the front angle of the thyroid cartilage, or more commonly known as the Adam's apple. A system of muscles enable the cartilages to glide, pivot or seesaw. The term "glottis" is defined as the generally triangular space enclosed by the two vocal folds by their connection to the thyroid cartilage. The glottis can be closed by the muscular movement of the arytenoid cartilages which bring the vocal folds together. During normal respiration and also during the articulation of voiceless consonants, such as p, f, t and k, the glottis is open. Consonants that are pure noises without the periodic resonant, musical sounds of vowels are termed "voiceless consonants." Consonants that are a combination of noise and laryngeal tones are termed "voiced consonants", such as b, v, voiced s (z), etc.
When the glottis is completely opened, the glottis is ready to begin vibrating, provided that tension of the thyro-arytenoid muscle is not required for a particular register. Contrary to former belief, this tension is not essentially produced by the stretching of the vocal folds, but rather by an internal muscular contraction. The rate of vocal fold vibration or the fundamental frequency of the voice depends on a number of factors including the sex and age of the speaker, the speaker's intonations and, in particular, on the vocal fold length, size, mass and tension. For example, the vocal folds are thick for a low register and, for higher registers, the vocal folds are thin and shaped more or less like a ribbon. Additionally, a portion of the vocal fold, instead of the entire vocal fold, may vibrate. The vibrating body or vocal fold is thus correspondingly shortened in length to produce higher tones. The rate of vibrations of the vocal folds varies between 60 to 70 cycles per second (Hz) for the lowest male voices with an upper limit of 1200 to 1300 cycles per second (Hz) for the soprano voices. The average rate of vibration is from 100 to 150 Hz for a man and from 200 to 300 Hz for a woman.
Vocal fold vibrations are modified by the effect of resonance of the vibrations throughout various cavities in the chest and head. Resonance is a phenomenon in which sound vibrations or waves tend to set in motion elastic bodies that are in the path of the sound waves. For example, if the particular resonating frequency of the body in the path of the sound wave is the same as that for the sound wave, the body begins to vibrate. Vocal fold vibrations are typically modified by resonance in the chest, throat, mouth (including the area formed by projection and rounding of the lips), nose and sinus cavities. By moving the tongue and jaw, the cavity of the mouth can change almost endlessly in shape and volume to result in variations in the resonance of vocal fold vibrations. The great mobility of the lips further contributes to the resonance of the mouth cavity.
Voiced sound signals have complex frequencies that are based on the various resonance frequencies of the relevant cavities and harmonic or overtone, whole-number multiples of the basic fundamental frequencies of the sound signals. Resonating overtones are termed "formant sound" and appear in distinct frequency bands corresponding to each of the particular cavities. The first, or lowest frequency, formant is created by the resonance in the mouth and throat cavities and is noted for frequent frequency shifts as the mouth changes dimensions and volume during the formation of various sounds, particularly vowel sounds. The highest frequency formant involves resonance in the nose and sinus cavities and is more constant than formant sound in the lower frequency bands because such cavities tend to have more constant volumes and shapes than the mouth. Resonant voiced sounds are characterized by these formants. For example, most vowels are recognized by the sound of the first two formants together, but vowels sound fuller when the first three formants are heard. The higher fourth, fifth and sixth formants are generally present, but tend to be more characteristic of individual voice quality than of a particular vowel sound. Harmonics are produced in human voices up to 4000 or 5000 Hz and, in some cases, even higher frequencies.
The vocal folds and much of the structure of the major sound resonating cavities are made of flexible tissue that are immediately responsive to muscular control. For example, the muscular control of the vocal folds and ligament tissue in cooperation with the mechanical linkage of bone and cartilage allows for a purposeful production of voiced sound and variation in voice pitch. Similarly, the muscles of the tongue and throat permit purposeful sound variation. Other cavities are similarly affected, but nasal and sinus cavities are affected to a more limited degree.
A. D. Bell, C. R. McQuiston and W. H. Ford designed instrumentation in the late 1960's and early 1970's intended to indicate emotional arousal or stress from voice. U.S. Pat. No. 3,971,034, ("Pat. '034") to Bell et al., teaches a method and apparatus for detecting psychological stress by evaluating manifestations of physiological change in the human voice. In Pat. '034, muscle microtremor causes a slight variation in vocal cord or fold tension resulting in shifts in a voice pitch. The oscillation or microtremor slightly varies the volumes and shapes of resonant cavities thereby frequency shifting the formant frequencies. These shifts around a central carrier frequency of the voiced sound constitute a frequency modulation of the central carrier frequency.
In Pat. '034, the microtremors have a physiological effect of very slightly modifying speech sounds to an extent corresponding to the magnitude of the movement caused by the microtremor. The microtremors occur at a maximum of approximately 8 to 12 Hz and are at maximum when the muscles are at a relatively relaxed state, such as during nonstressful conversational speech. The microtremors are very small and far below the typical fundamental frequency ranges of the human voice. The microtremors very slightly modify the tension of the vocal cords, tongue, lips, throat, etc., as well as the volumes and shapes of the corresponding resonating cavities during speech. This modification has the effect of modulating speech sound frequency at the changing frequency of the microtremor creating inaudible voice changes that the apparatus of Pat. '034 could detect.
In Pat. '034, the microtremors are suppressed under stress. The amplitude or extent of the microtremor is a function of psychological stress. The microtremors are at a maximum under normal states of relaxation and diminish under higher levels of stress in direct response to ANS influence. Thus, the frequency modulation is inversely proportional to the stress experienced by the speaker at the time of utterance.
Voice microtremor measurements are made electronically by a variety of voice stress analysis instruments. Dektor Counterintelligence and Security Company manufactured a psychological stress evaluator (PSE), which incorporates the apparatus of Pat. '034, to indicate psychological stress in speech sound. The electronic circuitry of the PSE records the utterances of voice and transduces the utterances using a microphone into electrical signals. The electrical signals are processed to emphasize selected characteristics of low frequency elements or representations of the recorded voice. The electronic circuitry of the PSE functions as a low frequency filter slowing down audio frequencies so that such audio frequencies match the fixed response range of the strip chart generator. The PSE is capable of processing speech samples of about one second or less.
The Computer Voice Stress Analyzer (CVSA) was introduced in 1988 by Computer Voice Stress Associates, the original manufacturer, and is currently manufactured by the National Institute for Truth Verification. The CVSA has some simplified operational features of the PSE and provides a more responsive strip chart apparatus than the PSE that is better matched in the range of frequency response with the recorded, filtered voice signals. The CVSA processes only very short speech samples and is used primarily for one word, e.g., "yes" or "no," answers used in deception detection protocols. However, CVSA and PSE generate "blocking" which is speculated to be an artifact of the match of the strip chart apparatus response range to the range of received electronically filtered voice signals. Blocking is also affected by the momentum of the heated stylus and friction on the strip chart.
Another voice stress analyzing instrument that has received some significant attention in both deception detection studies and a variety of other uses such as pre-employment tests, vocational assessment personality inventories and screening phone calls for alleged sexual abusers, is the Mark II Voice Analyzer. The Mark II electronically measures and counts spikes of roughness, or "tremolo", in electronically filtered speech instead of charting pattern changes as do the PSE and CVSA. The Mark II provides a numerical measure, i.e., a count of tremolo spikes, that is related to psychological stress. The Mark II was designed for analyzing brief speech samples obtained in deception detector protocols. However, all of the previously mentioned voice stress analyzers are capable of analyzing only very brief speech samples. Additionally, the previously mentioned voice stress analyzers provide analysis of voice stress in terms of deception detection protocols and do not analyze speech samples for biofeedback information.
What is needed is an improved method and apparatus to measure and analyze dynamic levels of psychological stress in people. In particular, what is needed is method and apparatus for detecting physiological indicators of psychological stress that can process long speech samples. Further needed is method and apparatus for detecting physiological indicators of psychological stress to provide biofeedback and allow voice stress research to go beyond typical deception detection protocols into wider use as a biofeedback instrument.