Testing the intelligibility of speech via telephony is an important aspect of the communications industry, since one of the primary goals of a speech communications system is to enable a speech message to be understood and comprehended by the receiver of the message. The ultimate goal of a speech intelligibility test is to obtain a measure indicating how much of an incoming speech signal a listener is able to understand in normal conversation using, for example, a particular telephone. Many new technologies such as digital transmissions, speech coders and Internet telephony suffer from audio impairments not present in traditional analog systems, thus increasing the necessity for a reliable speech intelligibility test.
One manner in which speech intelligibility is tested is by testing the relative intelligibility of individual speech sounds. An individual speech sound can be represented by a phonetic symbol (hereinafter, speech sounds will be referred to by the phonetic symbol which represents it. For example, the speech sound represented by the phonetic symbol [t] will simply be referred to as speech sound [t]).
FIG. 2(a) is a chart showing phonetic symbols for various international consonant speech sounds, while FIG. 2(b) is a chart showing phonetic symbols for various international vowel speech sounds. FIG. 3(a), on the other hand, is a chart listing phonetic symbols for various English consonant speech sounds, while FIG. 3(b) is a chart listing phonetic symbols for various English vowel speech sounds. Each chart also describes the manner of articulation and place of articulation for each speech sound, as is well known in the prior art and as will be further discussed below. For instance, referring to FIG. 3(a), the speech sound [m] is a bilabial (place of articulation) nasal stop (manner of articulation). FIGS. 4(a) and 4(b) list the consonant and vowel phonetic symbols, respectively, along with words or words that employ the speech sound. These figures, as well as FIG. 5(a) which will be introduced and discussed later, are re-printed from P. Lagefoged, A Course in Phonetics, Harcourt Brace Jovanovich (1993), which is incorporated by reference herein.
The relative intelligibility of individual speech sounds is commonly tested in a two-item forced choice format, one example of which is illustrated in FIG. 1. In FIG. 1, sound device 10, which can be any device able to convey sound to a listener, transmits stimulus word 12 to test subject 14. After hearing stimulus word 12, test subject 14 will see two response options, 18a and 18b, appear on word display device 16. Response options 18a and 18b are words which, as will be further explained later, have pronunciations which are similar to each other. One of the two response options is the English equivalent of stimulus word 12, while the other is not. The task of test subject 14 is to distinguish which of the two response options, 18a or 18b, was heard, and to indicate his or her selection by using a selection device (not shown).
One prior art test which uses a two-item forced choice format is Voier's Diagnostic Rhyme Test (hereinafter "DRT"). This test is described in W. Voiers, Evaluation of Processed Speech Using the Diagnostic Rhyme Test, Speech Technology, Jan/Feb, p.30-39, (1983). The DRT tests subjects using pairs of words (comprising real words, proper names and non-words) that differ by one speech sound. The differing, or contrasting, speech sounds in this test are generated by varying +/- feature values within a theory of perceptual distinctive features, as is well known in the art and as will be described in greater detail below.
As described in M. Kenstowicz and C. Kisseberth, Generative Phonology, Academic Press (1979), which is incorporated by reference herein in its entirety, features are units of phonological structure (phonology is the science of speech sounds). A feature system can be either a perceptual feature system or an articulatory feature system. Generally, perceptual feature systems concern the acoustical qualities of a speech sound while articulatory features concern particular human activities, e.g.--lip rounding, tongue positioning, etc., which produce speech sounds when coordinated. These feature systems are described in Preliminaries to Speech Analysis, MIT Press, Cambridge Mass.; The Sound Pattern of English, Harper & Row, New York; M. Halle, Phonology, (1990); D. Osherton and H. Lasnik, Language, Volume I, MIT Press, Cambridge Mass.; and A Survey of Distinctive Feature Values, UCLA Working Papers in Phonetics 66, pp. 124-150, all of which are incorporated herein in their entirety.
In both types of feature systems, a particular speech sound can be represented by a matrix of [+] or [-] feature values. A particular set of feature values is used to uniquely describe a speech sound and distinguish it from all other speech sounds. FIG. 5(a) is a chart showing some of the features required for classifying English speech sounds. For instance, the figure shows that the voicing feature can be classified as [+voice] or [-voice], and lists the speech sounds that have each classification. As another example, to pronounce the English consonant [m] as in make, the velum is lowered to allow air to pass through the nose. Therefore, [m] has a [+] value for the feature [nasal]. The English consonant [b] has almost identical feature values as [m]. However, to pronounce [b] as in bake, the velum is raised, thus preventing air from flowing through the nose. Therefore, [b] has a [-] value for the feature [nasal].
Similarly, FIG. 5(b) is a chart showing a feature matrix for various English vowels. For example, for the dorsal feature tenseness, the figure shows speech sounds that are tense having a [+] value and speech sounds that are lax (the opposite of tense) having a [-] value.
Thus, returning to the DRT prior art testing system, DRT generates sets of word pairs to be presented to the test subject as response options, such that, for the contrasting speech sounds, the value of only one perceptual feature for the first word differs from the value of the same perceptual feature for the second word. Specifically, and as is well known in the art, the DRT utilizes six different perceptual features (voicing, nasality, sustention, sibilation, graveness and compactness) which are referred to as perceptual distinctive features, and includes sixteen word pairs representing a [+/-] contrast for each of the six features. However, contrasts generated in this manner do not accurately reflect the consonant inventory of American English. For instance, despite the fact that there exists only three pairs of contrasting speech sounds which fit the above criteria for [nasal] (i.e.--each speech sound of the pair has the same feature values as the other speech sound of the pair except for having an opposite nasality feature value), the DRT tests the [nasal] feature contrasts sixteen times. Furthermore, DRT tests contrasts for consonants only; no vowel contrasts are tested, and consonants are tested only in the initial position in a word.
The DRT, by selecting contrasting speech sounds to test as it does, yields intelligibility test results which may be unreliable. For instance, some contrasts which may be tested are not highly likely to be perceptually confused by a listener, despite the fact that they differ in the +/- values of one of the distinctive perceptual features, e.g.--the sound represented by the phonetic symbol [k] as in back, as compared to the sound represented by the phonetic symbol [tj] as in batch. Similarly, some contrasts which are likely to be perceptually confused by a listener are not tested because they differ in the +/- values of more than one distinctive feature, e.g.--the sound represented by the phonetic symbol [w] as in swim, as compared to the sound represented by the phonetic symbol [l] as in slim.
Another prior art test, which uses a similar method of testing subjects with words which are generated by varying +/- feature values, is van Santen's Minimal Pairs Intelligibility Test (hereinafter "MPI"). This test is described in J. van Santen, Perceptual Experiments for Diagnostic Testing of Text-to-Speech Systems, Computer Speech and Language 7, p.49-100, (1993). Like the DRT, the MPI test presents subjects with pairs of words (including numerous multi-syllabic words such as "divergences" and "intransigence") having contrasting speech sounds, generated solely by varying +/- feature values.
Thus, there exists a need for an intelligibility testing system which reliably measures the speech intelligibility of a communication system.