The invention pertains to speech signal processing and, more particularly, to methods and apparatus for maximizing speech intelligibility in quiet or noisy backgrounds. The invention has applicability, for example, in hearing aids and cochlear implants, assistive listening devices, personal music delivery systems, public-address systems, telephony, speech delivery systems, speech generating systems, or other devices or mediums that produce, project, transfer or assist in the detection, transmission, or recognition of speech.
Hearing and, more specifically, the reception of speech involves complex physical, physiological and cognitive processes. Typically, speech sound pressure waves, generated by the action of the speaker's vocal tract, travel through air to the listener's ear. En route, the waves may be converted to and from electrical, optical or other signals, e.g., by microphones, transmitters and receivers that facilitate their storage and/or transmission. At the ear, sound waves impinge on the eardrum to effect sympathetic vibrations. The vibrations are carried by several small bones to a fluid-filled chamber called the cochlea. In the cochlea, the wave action induces motion of the ribbon-like basilar membrane whose mechanical properties are such that the wave is broken into a spectrum of component frequencies. Certain sensory hair cells on the basilar membrane, known as outer hair cells, have a motor function that actively sharpens the patterns of basilar membrane motion to increase sensitivity and resolution. Other sensory cells, called inner hair cells, convert the enhanced spectral patterns into electrical impulses that are then carried by nerves to the brain. At the brain, the voices of individual talkers and the words they carry are distinguished from one another and from interfering sounds.
The mechanisms of speech transmission and recognition are such that background noise, irregular or limiting frequency responses, reverberation and/or other distortions may garble transmission, rendering speech partially or completely unintelligible. A fact well known to those familiar in the art is that these same distortions are even more ruinous for individuals with hearing impairment. Physiological damage to the eardrum or the bones of the middle ear acts to attenuate incoming sounds, much like an earplug, but this type of damage is usually repairable with surgery. Damage to the cochlea caused by aging, noise exposure, toxicity or various disease processes is not repairable. Cochlear damage not only impedes sound detection, but also smears the sound spectrally and temporally, which makes speech less distinct and increases the masking effectiveness of background noise interference.
The first significant effort to understand the impact of various distortions on speech reception was made by Fletcher who served as director of the acoustics research group at AT&T's Western Electric Research (renamed Bell Telephone Laboratories in 1925) from 1916 to 1948. Fletcher developed a metric called the articulation index, AI, which is “ . . . a quantitative measure of the merit of the system for transmitting the speech sound.” Fletcher and Galt, infra, at p. 95. The AI calculation requires as input a simple acoustical description of the listening condition (i.e. speech intensity level, noise spectrum, frequency-gain characteristic) and yields the AI metric, a number that ranges from 0 to 1, whose value predicts performance on speech intelligibility tests. The AI metric first appeared in a 1921 internal report as part of the telephone company's effort to improve the clarity of telephone speech. A finely tuned version of the calculation, upon which the present invention springboards, was published in 1950, nearly three decades later.
Simplified versions of the AI calculation (e.g. ANSI S3.5-1969, 1997) have been used to test the capacity of various devices for transmitting intelligible speech. These versions originate from an easy-to-use AI calculation provided by Fletcher' staff to the military to improve aircraft communication during the World War II war effort. Those familiar with the art are aware that simplified AI metrics rank communication systems that differ grossly in acoustical terms, but they are insensitive to smaller but significant differences. They also fail in comparisons of different distortion types (e.g., speech in noise versus filtered speech) and in cases of hearing impairment. Although Fletcher's 1950 finely tuned AI metric is superior, those familiar with the art dismiss it, presumably, because it features concepts that are difficult and at odds with current research trends. Nevertheless, as discovered by the inventor hereof and evident in the discussion that follows, these concepts taken together with the prediction power of the AI metric have proven fertile ground for the development of signal processing methods and apparatus that maximize speech intelligibility.