A video teleconferencing system including a microphone having a sensitivity pattern independent of the microphone's azimuth angle is provided. Specifically, a microphone that maximizes sensitivity in the direction of a sound source of interest, and minimizes sensitivity to sound from other directions is described.
Video teleconferencing systems create virtual meetings between two or more parties that are separately located in, for example, different rooms. The rooms may be within a same building or in different buildings, and the different buildings can be located in different cities, countries, continents, etc. Thus, video teleconferencing systems create meetings that would otherwise require travel of potentially large distances.
To create virtual meetings, video teleconferencing systems transmit audio data along with video data, and thus include one or more microphones at each location to capture sound waves. The microphones convert sound waves generated in one video teleconferencing room into electrical impulses for transmission to another video teleconferencing room. Audio quality is therefore directly dependent on the positioning of the microphone within the room, the acoustics of the room, and particularly to the characteristics of the microphone itself.
For example, a conventional microphone used to capture sound from a sound source of interest, such as a person speaking, receives direct sound waves, reflected sound waves and reverberant sound waves from the source. Direct sound waves travel directly to the microphone without reflection, and are the sound waves intended to be captured by microphones. Direct sound wave levels are inversely proportional to the distance between the sound source of interest and the microphone receiving the sound.
Reflected sound waves do not travel directly to the microphone. Instead, they are reflected multiple times by objects in the room, or the room itself, before reaching the microphone. For example, sound waves from a sound source of interest may be reflected by walls, floors, ceilings, chairs, etc. Reflected sounds waves that propagate less than 50-80 ms (corresponding to a propagation distance of 17 to 27 meters) before reaching the microphone are known as “early reflections”, and have pressure levels approximately equal to those of direct sound waves, but are delayed in time.
Early reflections from the sound source of interest may positively contribute to the audio received by the microphone. However, they may also distort the audio. The time delay causes a phase difference between the early reflections and the direct sound waves, which may result in cancellation of some of the frequency components of the direct sound waves. This phenomenon is known as “comb filtering”, and has a negative impact on sound quality.
Reflections that propagate for more than 50 to 80 ms (17 to 27 meters) are known as “reverberant sound”. Reverberant sound arrives at the microphone from nearly every direction because these sound waves have reflected many times within the room. Also, their pressure level is largely independent of microphone-sound-source distance. Unlike early reflections, reverberant sound always contributes negatively to audio quality by creating a “distant”, “hollow”, and/or “muffled” characteristic.
The level of distortion caused by reverberant sound is determined by a ratio of a level of direct sound to a level of reverberant sound. For example, if the sound source of interest is very close to the microphone the ratio of direct sound to reverberant sound is large, and distortion is small. As the sound source of interest moves away from the microphone the ratio of direct sound to reverberant sound will decrease, increasing distortion.
A distance at which the level of the direct sound equals the level of the reverberant sound is known as the “room radius”, which can be determined for every room. As a sound source of interest moves outside of the room radius, reverberant sound dominates and distortion increases. Conversely, as the sound source moves within the room radius the direct sound dominates, and distortion decreases. Therefore, for conventional microphone systems, the sound source of interest should remain within the room radius to avoid significant audio distortion.
Moreover, direct sound, reflected sound, and reverberant sound are not limited to the sound source of interest, and may also be present for noise sources in a video teleconferencing room. Noise sources include, for example, fan noise from ventilation systems, cooling fan noise from electronic equipment (e.g. ceiling mounted projectors), noises from outside of the video teleconferencing room, sound from loudspeakers, moving chairs, and any other sound emitting from sources other than the sound source of interest. Conventional video teleconferencing system microphones receive direct, reflected and reverberant sound waves from these noise sources as well, further deteriorating audio quality.
In addition, each noise source has a different dominant component. For example, cooling fans installed on electrical equipment and noise originating outside of the video teleconferencing room primarily contributes noise in the form of reverberant sound waves, while ventilation systems contribute both direct and reverberant sound waves.
Conventional microphones also contribute noise in the form of an echo. An echo occurs when sound from a loudspeaker used to reproduce audio transmitted from remote parties to the video teleconference is captured by the microphone and retransmitted to the remote party. Echoes also have direct, reflected and reverberant sound components, but dominance of one component over the others is determined by a loudspeaker-to-microphone distance, which is not always constant.
Echoes are conventionally attenuated with echo cancellers, which are adaptive filters that adapt to a loudspeaker-microphone channel response. However, echo cancellers cannot prevent a microphone from receiving an echo. Instead, echo cancellers merely attenuate echoes already present in an audio signal.
Because of their adaptive nature, echo cancellers require time to adapt to a given response, making time-invariant loudspeaker-microphone channel responses desirable. In practice, however, microphones may be repositioned during a video teleconference in order to capture audio from several different sound sources, and time-invariant loudspeaker-to-microphone channels are difficult to achieve. Thus, a conventional video teleconferencing system's echo cancellers are typically required to adapt multiple times. Moreover, echo cancellers have difficulty attenuating reverberant sound components, resulting increased computational complexity as the level of reverberant echoes increase.
The issues described above are exacerbated when omni directional microphones are used in video teleconferencing systems. An omni directional microphone receives audio from all directions with equal sensitivity, and thus receives direct, reflected and reverberant sounds from every sound source within the room, including noise sources. In fact, only noise sources below a table on which the microphone is placed will be attenuated because the table functions as a barrier to sound pressure waves. Though omni directional microphones are capable of capturing audio from all sound sources of interest without being repositioned, the resulting audio quality is poor because of captured noise source sound.
One way to improve the quality of audio transmitted by a video teleconferencing system is to use directional microphones. Unlike omni directional microphones, a directional microphone has higher sensitivity with respect to certain directions over others, and inherently filters sound from at least some noise sources. This improves audio quality relative to an omni directional microphone, but also requires that a directional microphone be oriented to align its direction of highest sensitivity (“main axis”) toward the sound source of interest. Therefore, the directional microphone requires repositioning every time the sound source of interest changes position.
Directional microphones having a cardioid sensitivity pattern or a bidirectional sensitivity pattern are typically used in video teleconferencing. A microphone having cardioid sensitivity has a directivity function given by:
            g      ⁡              (        α        )              =                  1        2            +                        1          2                ⁢                  cos          ⁡                      (            α            )                                ,where α is the azimuth angle of a main axis with respect to horizontal. A typical cardioid microphone has a maximum sensitivity at α=0° and a minimum sensitivity at α=180°.
A bidirectional microphone has a directivity function given by: g(α)=cos(α), where α is also the azimuth angle of a main axis with respect to horizontal. This microphone has a maximum sensitivity for α=0° and α=180°, and a minimum sensitivity when α=90° and α=270°. Because both the cardioid and bidirectional sensitivity patterns on the azimuth angle of the microphone, sensitivity for these microphones varies horizontally and vertically.
As in the case of an omni directional microphone, placing the cardioid or bidirectional microphone on a table improves audio quality because the table acts as a barrier to sound waves originating below the table surface, improving the direct to reverberant sound ratio.
Microphone sensitivity may also be improved by placing the microphone directly on the table-top surface to receive both direct sound and early reflections. The direct sound waves and early reflections reflected by the table remain in phase, and combine to form a pressure wave that is double that of the direct sound wave. This effectively increases the microphone sensitivity by six decibels compared to a microphone in a free field, and is referred to as the “boundary principle”.
Directional microphones still have the drawback of requiring the sound source of interest to remain located near the main sensitivity direction of the microphone. Thus, when several people take part in a meeting, the microphone must be continually readjusted to avoid diminished audio quality as each person speaks. Therefore, cardioid and bidirectional microphones requires that people taking part in the video teleconference be aware of the sensitivity patterns of the microphone in order to make position adjustments, making these directional microphones difficult to use.
Some conventional microphone systems use several directional microphones to avoid microphone repositioning. For example, one conventional microphone uses four cardioid elements rotated at 90° relative to each other, and selects audio from the microphone element having a main axis closest to the active sound source of interest. Another conventional microphone system uses two bidirectional microphone elements placed at 90° relative to each other, and audio processing to create a virtual microphone sensitivity pattern. For example, if the physical bidirectional patterns of the two bidirectional microphones exist at main axes 0° and 90°, virtual patterns may be created in the range of 45° to 135°.
However, these conventional microphone systems create time-varying loudspeaker-microphone channel responses that increase the complexity of echo canceling, and force echo cancellers to adapt more frequently. Optimal echo cancellation may therefore be prevented by the frequent echo canceller adaptation. These conventional microphone systems also require more complex hardware, increasing a difficulty of installation.
To avoid increasing system complexity and difficult installations, fixed-sensitivity-pattern microphones are preferred in video teleconferencing systems. The omni directional microphone discussed above has a fixed sensitivity pattern, but lacks the ability to suppress reverberant sound. Directional microphones also have fixed sensitivity patterns and suppress reverberant sound, but require frequent repositioning.
A third conventional microphone that has a fixed sensitivity pattern is a toroidal microphone. A toroidal microphone's sensitivity pattern is in the shape of a toroid and is given by: g(θ)=sin(θ). One such conventional toroidal microphone may be constructed with two orthogonal, horizontally coincident bidirectional microphone elements whose output signals are added in quadrature phase. Alternatively, a second order toroidal microphone may be constructed to have a sensitivity pattern given by: g(θ)=sin2(θ), from four orthogonal, horizontally coincident bidirectional microphones whose signals are added in phase. Alternatively, each bidirectional microphone can be constructed by subtracting two omni directional elements. One of the elements in each pair may be shared for all four pairs. For example, a second order toroidal microphone can be constructed using five omni directional microphones.
Because the sensitivity pattern of a toroidal microphone depends on the elevation angle of the microphone, not the azimuth, its sensitivity only varies in a vertical direction. Therefore, toroidal microphones may capture sound from sources at different positions throughout a room without the need for frequent repositioning. Using a microphone with toroidal, doughnut-shaped, directivity in teleconferencing was suggested by Sessler et. al during the 1960's [G. M. Sessler, J. E. West, and M. R. Schroeder, “Toroidal microphones,” The Journal of the Acoustical Society of America, vol. 46, no. 1A, pp. 28-36, 1969]. A toroidal microphone can be placed above a round table to receive sound from all parties to the call with maximal and equal sensitivity, while attenuating the reverberant noise field and suppressing the acoustic echo path from the loudspeaker.
The above-described second order toroid is created using five omni directional microphones elements placed in a horizontal plane or, alternatively, by sampling the sound field with tubes. However, the implementation using tubes is difficult to balance acoustically and is limited by problems resulting from tube resonances. Another implementation using four bidirectional elements together with a plastic cylinder is also known. [G. M. Sessler and J. E. West, “A simple second-order toroid microphone,” Acustica, vol. 57, no. 4-5, pp. 193-199, 1985]. However, this implementation also relies on a large number of microphone elements to create the toroid directivity pattern, and is plagued with phase and sensitivity matching problems among the microphone elements. Hence, conventional toroid microphones are large, costly and very difficult to implement.