Advances in telecommunications technology are continuously improving the ways in which people carry out both business and personal communications. Such advances include improvements in video conferencing, increased availability of ISDN links and computer networks, and improvements in ordinary telephone service. These technological advances create many design challenges. For example, many telecommunication systems require a solution for distinguishing speech from noise in an audio signal; a device which performs this function has been referred to as a voice activity detector (VAD).
One application for a VAD is in a half-duplex audio communication system used in "open audio", or speakerphone, teleconferencing. Half-duplex transmission is transmission which takes place in only one direction at a given point in time. Therefore, it is a common practice in such a system to temporarily deactivate the microphone at a given site while that site is receiving a transmission and to mute the speaker at either site to eliminate audio feedback being received by the remote site. Consequently, a VAD may be necessary to detect the presence of speech both in the audio signal received from a remote site and in the audio signal to be transmitted to the remote site in order to implement these functions. A VAD may also be used to signal an echo suppression algorithm, to distinguish "voiced" speech from "unvoiced" speech, and in various other aspects of audio communications.
Some existing VADs make use of the communication link itself in detecting speech activity. For example, certain data may be provided to a VAD at one end of the link by "piggybacking" the data on other audio data transmitted from the other end. For various reasons, however, it is not desirable to have a VAD which is dependent upon a remote site in detecting speech. In addition, some existing VADs have undesirably slow response times, frequently misclassify speech, or require excessive processing time.
Another design issue relates to the use of headsets to implement closed audio microphone and speakers in video conferencing. Video conferencing software applications are available which, in general, permit both audio and visual communication between the user of one personal computer and the user of another personal computer via ISDN lines, a LAN, or other channels. One such application is the ProShare.TM. Personal Conferencing Video System, created by Intel Corporation of Santa Clara, California. Some video conferencing applications are sold precalibrated to support one or more particular models of headsets. This precalibration may be accomplished by including data in the software code relating to the appropriate hardware settings, such as the microphone input gain. However, if the user wishes to use a non-supported headset, he or she must generally go outside of the video conferencing application to the operating system in order to adjust the hardware settings. In doing so, the user essentially must guess at the best hardware settings, often having to readjust the settings by trial and error in order to achieve the optimum settings. Hence, existing hardware calibration solutions provide little flexibility in terms of ability to support multiple different headsets.
In view of these and other design issues, therefore, it is desirable to have a VAD which operates independently of the remote site. It is further desirable that such a VAD provide high-accuracy (infrequent misclassifications), fast response time, adaption to the remote site's fluctuating signal-to-noise ratio, and consistent half-duplex performance when the remote user transitions between open and closed audio modes. In addition, it is desirable to provide a VAD which can be directly used by a hardware calibration solution. Finally, it is desirable to have a hardware calibration solution which automatically adjusts the hardware settings to be appropriate for any headset a user wishes to employ.