Traditionally, communications devices such as speakerphones, personal communicators and the like have been evaluated with live human conversation in uncontrolled acoustic environments. End-user groups or experienced listeners, commonly called "golden ears," would evaluate audio performance of a device during live conversation and would also execute various tasks designed to stress or "exercise" the device through its intended performance range. However, there are several disadvantages when using live conversation in uncontrolled acoustic environments to evaluate such a device.
First, live conversation is not reproducible. For instance, if two experimenters or evaluators hear a problem while evaluating a communications device, it is difficult to recreate the exact circumstances under which the communications device failed. Each person may not know exactly what he/she was saying at that particular point in time or may not be able to say it in quite the same way. Complex communications devices also often employ dynamically varying internal parameters and apply non-linear processes, making live conversation even more difficult to use for testing. To complicate things even more, communications device performance depends on what is going on at both ends of the telephone line or other connection so that both ends need to coordinate the identity of the speaker(s), the identity of the listener(s) and the content and timing of what is being said, in order to reproduce a particular event. Uncontrolled acoustic environments (e.g., dynamic ambient noise) can also add variability to speakerphone performance.
If a communications device problem cannot be easily reproduced, it is difficult to figure out the root cause of why the communications device failed and how to fix the problem.
Second, when evaluating more than one communications device or device type, or the same communications device in more than one condition or environment, it is sometimes difficult to determine if differences in performance should be attributed to the communications device or environmental factor itself, or variability in the conversation or acoustic environment. Obviously, when performance differences are robust, this does not present much of a problem. However, when differences in performance are small, there is a danger of a confound--concluding that one communications device is better than another simply because the conversation (or any task) held over the communications device stressed one communications device more than the other. For example, the conversation over communications device A may have had twice the amount of double-talk (where people at both ends are talking at the same time) than communications device B--meaning that differences in communications device performance between A and B may be due to differences in the verbal exchange held over them and not differences between the communications devices themselves. Also, there could have been a spike in background noise at the moment one person began to speak.
Third, experimenters or evaluators do not have consistent control of the volume and sound quality of live speech, while the level (dB) and sound quality of recorded speech can be precisely controlled. Live speech makes it difficult to investigate the effects of different speech levels at each end of the telephone line or other connection. Furthermore, even if an experimenter or evaluator was able to speak at a particular level, there is still the problem of saying what was said before inexactly the same way.
Fourth, ambient noise or other background sound is not controlled This normally is not a major problem if the noise is steady-state. However, most real-life ambient noise is dynamic (e.g., traffic noise, people talking in the background, etc.) This dynamic noise can cause variability in communications device performance because spikes in the ambient noise will occur at different times during the verbal interactions. Therefore, for reliable testing, it is not sufficient just to make recordings of dynamic ambient noise. Rather, the recorded noise must be synchronized with verbal interactions over the communications device so that spikes in the noise are introduced at the same point of the verbal interactions upon playback.
Finally, recent advances in communications device technology, such as full-duplex, echo cancellation, noise reduction and the like, and the exponential growth of communications device inclusion in a variety of non-traditional devices (e.g., personal communicators and computers), has made traditional live-conversation methodologies for testing perceived acoustic performance obsolete. This results from the inability of old methods to detect new impairments (echo, variable attenuation, etc.).
Thus, there is a need to make the device testing and evaluation process more efficient, the perceived problems more reproducible, and even small differences in device performance more detectable.