Although telephone technology has been with us for some time and, through a steady flow of innovations over the past century, has matured into a relatively effective, reliable means of communication, the technology is not flawless. Great strides have been made in signal processing and transmission of telephone signals and in digital networks and data transmission. Nevertheless, the basic telephone remains largely unchanged, with a user employing a handset that includes a microphone located near and directed towards the user's mouth and an acoustic transducer positioned near and directed towards the user's ear. This arrangement can be rather awkward and inconvenient. In spite of the inconvenience associated with holding a handset, this arrangement has survived for many years: for good reason. The now familiar, and inconvenient, telephone handset provides a means of limiting the inclusion of unwanted acoustic signals that might otherwise be directed toward a receiver at the "other end" of the telephone line. With the telephone's microphone held close to and directed toward a speaker's mouth other acoustic signals in the speaker's immediate vicinity are overpowered by the desired speech signal.
However, there are many situations in which the use of a telephone handset is simply impractical, whether because the telephone user's hands must be free for activities other than holding a handset or because several speakers have gathered for a telephone conference. "Hands free" telephone sets of various designs, including various speaker-phones and telephone conferencing systems, have been developed for just such applications. Unfortunately, speaker-phones and telephone conferencing systems in general tend to exhibit annoying artifacts of their acoustic environments. In addition to the desired acoustic signal from a speaker, echos, reverberations, and background noise are often combined in a telephone transmission signal.
In audio telephony systems it is important to accurately reproduce the desired sound in the local environment, i.e., the space in the immediate vicinity of a speaker, while minimizing background noise and reverberance. This selective reproduction of sound from the local environment and exclusion of sound outside the local environment is the function at which a handset is particularly adept. The handset's particular facility for this function is the primary reason that, in spite of their inconvenience, handsets nevertheless remain in widespread use. For teleconferencing applications handsets are impractical, yet it is particularly advantageous to capture the desired acoustic signals with a minimum of background noise and reverberation in order to provide clear and understandable audio at the receiving end of telephone line.
A number of technologies have been developed to acquire sound in the local environment. Some teleconferencing systems employ directional microphones, i.e., microphones having a fixed directional pickup pattern most responsive to sounds along the microphone's direct axis, in an attempt to reproduce the selectivity of a telephone handset. If speakers are arranged within a room at predetermined locations which locations are advantageously chosen based upon the responsivity of microphones situated about the room, acceptable speech reproduction may be achieved. The directional selectivity of the directional microphones accents speech that is directed toward a microphone and suppresses other acoustic signals such as echo, reverberations, and other off-axis room sounds. Of course, if these undesirable acoustic signals are directed on-axis toward one of the microphones, they too will be selected for reproduction. In order to accommodate various speakers within a room, such systems typically gate signals from the corresponding microphones on or off, depending upon who happens to be actively speaking. It is generally assumed that the microphone receiving the loudest acoustic signal is the microphone corresponding to the active speaker. However, this assumption can lead to undesirable results, such as acoustic interference, which is discussed in greater detail below.
Moreover, it is unnatural and uncomfortable to force a speaker to constantly "speak into the microphone" in order to be heard. More recently, attempts have been made to accommodate speakers as the change positions in their seats, as they move about a conference room, and as various participants in a conference become active speakers. One approach to accommodating a multiplicity of active speakers within a conference room involves combining signals from two directional microphones to develop additional sensitivity patterns, or "virtual microphones", associated with the combined microphone signals. To track an active speaker as the speaker moves around the conference room, the signal from the directional microphone or virtual directional microphone having the greatest response is chosen as the system's output signal. In this manner, the system acts, to some extent, as directional microphone that is rotated around a room to follow an active speaker.
However, such systems only provide a limited number of directions of peak sensitivity and the beamwidth is typically identical for all combinations. Some systems employ microphone arrangements which produce only dipole reception patterns. Although useful in some contexts, dipole patterns tend to pick up noise and unwanted reverberations. For example, if two speakers are seated across a table from one another, a dipole reception pattern could be employed to receive speech from either speaker, without switching back and forth between the speakers. This provides a significant advantage, in that the switching of microphones can sometimes be distracting, either because the speech signal changes too abruptly or because the background noise level shifts too dramatically. On the other hand, if a speaker has no counterpart directly across the table, a dipole pattern will, unfortunately, pick up the background noise across the table from the speaker, as well as that in the immediate vicinity of the speaker. Additionally, with their relatively narrow reception patterns, or beams, dipole arrangements are not particularly suite for wide area reception, as may be useful when two speakers, although seated on the same side of a conference table, are separated by some distance. Consequently, systems which employ dipole arrangements tend to switch between microphones with annoying frequency in such a situation. This is also true when speakers are widely scattered about the microphone array.
One particularly annoying form of acoustic interference that crops up in the context of a telephone conference, particularly in those systems which select signals from among a plurality of microphones, is a result of the fact that the energy of an acoustic signal declines rapidly with distance. A relatively small acoustic signal originating close to a microphone may provide a much more energetic signal to a microphone than a large signal that originates far away from a microphone. For example, rustling papers or drumming fingers on a conference table could easily dominate the signal from an active speaker pacing back and forth at some distance from the conference table. As a result, the receiving party may hear the drumbeat of "Sing, Sing, Sing" pounded out by fingertips on the conference table, rather than the considered opinion of a chief executive officer in the throes of a takeover battle. Oftentimes people engage in such otherwise innocuous activities without even knowing they are doing so. Without being told by an irritated conferee that they are disrupting the meeting, there is no way for them to know that they have done so, and they continue to "drown out" the desired speech. At the same time, the active speaker has no way of knowing that their speech has been suppressed by this noise unless a party on the receiving end of the conversation asks them to repeat a statement.