There are many software-based video conferencing applications designed to run on commodity personal computing platforms (e.g., “soft codecs”). Examples of such soft codecs have been developed by SKYPE®, MIRIAL® ClearSea, and RADVISION SCOPIA®. In the past, personal computers were not powerful enough to provide high definition video (720p or 1080p) at 30 frames per second or higher. As a result, there was a noticeable difference in quality that was achieved on personal computing hardware as compared to dedicated room video conferencing systems (i.e., “hard codecs”) available from companies such as POLYCOM® and CISCO®. With advances in central processor and graphics processor capabilities, inexpensive commodity personal computing hardware may be used to provide high definition video that is substantially equivalent to the video available from hard codecs that are dedicated to a particular room.
Teleconferencing systems, such as all-in-one (AIO) video conferencing units (also referred to herein as “AIO displays”) may be based on personal computing platforms. AIO video conferencing units may include personal computer (PC) hardware, speakers, a microphone (e.g., single microphone, microphone array, etc.), and a camera that are built into (or mounted to) the electronic display. As a result, low cost platforms may be deployed that provide video having similar or equal quality compared with dedicated hard codecs for videoconferencing that are relatively expensive.
The AIO display may also be relatively simple to set up compared with traditional hard codecs. For example, the AIO display may simply be placed in a conference room (e.g., mounted on a conference room wall), and then connected to power and a network (e.g., Internet, private intranet, cloud, etc.). Although AIO displays may provide high quality video, achieving high quality audio for a group conference may be difficult using conventional AIO displays.
For example, in a video conference made up of eight to twelve participants, a soft codec running on a conventional AIO display does not, by itself, provide an optimal audio conferencing environment. There are at least two reasons for this: conventional AIO displays often (1) use single omnidirectional microphones, and (2) have independent audio subsystems.
Conventional personal computer motherboards and sound cards used in AIO displays provide a single microphone input. This microphone input may be connected to an omnidirectional microphone in order to pick up audio in the local room. The single omnidirectional microphone may be placed in the middle of a conference table. A problem that may arise with this configuration is that omnidirectional microphones may pick up a significant amount of noise from directions other than the direction from which any given person is speaking. As a result, the signal to noise ratio (SNR) of the audio signal captured by the microphone may be relatively low. In addition to this SNR issue, an omnidirectional microphone may pick up speech energy that is reflected from various surfaces in the conference room in addition to the direct path speech from the talker. This may contribute to a “hollow” sound reproduced at the far end (i.e., remote conferencing room) for the remote participants of the video conference.
In order to address the issues of using a single omnidirectional microphone, some AIO displays include a microphone array in the bezel of the display. While this configuration may provide an improvement over the conventional use of single omnidirectional microphones placed in the middle of a large conference table, the microphone array will tend to provide a better SNR for speech from local participants who are sitting closest to the microphone array and a worse SNR for speech from local participants sitting farthest away.
In addition, when developing an AIO display one method currently being employed is to simply combine a personal computer subsystem with a display subsystem in a single enclosure. This configuration may cause a problem with the audio portion of the conference if the display subsystem supports the ability to accept audio inputs that are independent of the personal computer subsystem's audio inputs. The audio from a video conference will typically play through the PC's audio inputs. The display controller for the electronic display will also typically have an independent audio amplifier so that users can control the display's volume using a handheld remote control. A conventional method to integrate these two audio subsystems (i.e., audio from the PC subsystem and audio from the display subsystem) is to connect the PC's analog audio output to one of the analog audio inputs on the electronic display. During an audio conference, an acoustic echo cancellation (AEC) may be employed to prevent coupling of local playback audio into the microphone transmit signal. If the PC's audio output level is independent of the display's output level, the user may inadvertently set up the independent volume controls so that the PC output level is set relatively low and the display controller's volume level is set to a relatively high level to compensate.
The AEC may be designed to expect the acoustic power level of an echo signal to be close to the acoustic power level of the received signal. This is because usually there is an attenuation of signal power between the local speakers and the microphone. This attenuation is referred to as the Echo Return Loss (ERL). If the echo power that the AEC detects at the microphone is much larger than is expected (e.g., due to large external amplification), the AEC may mistake the echo power for local speech. As a result, the AEC may enter a half-duplex mode if its adaptive filter has not yet converged. When the AEC is in a half-duplex mode, the AEC may mute playback audio in order to let the local microphone audio through. As a result of muting playback audio, the microphone signal may be attenuated to zero. The AEC may detect this attenuation of the microphone signal as the end of the double talk state and allow the received audio to play into the room again. Due to the large external gain, the AEC may immediately (erroneously) detect the onset of local speech and again mute the playback audio. This cycle may continue indefinitely, which may result in choppy, unintelligible playback audio.
Another problem with audio processing with conventional AIO displays is that even if the PC audio level and the display controller amplifier levels are appropriately configured to begin with, a user may increase the analog gain on the display controller at a later time. Because this gain change may not be included in the AEC reference signal, acoustic echo may occur because the AEC may erroneously decide during far end single talk that the loud signal suddenly being picked up by the microphone is local speech, when in fact the signal is just echo. In addition to the problems described above, if the analog audio level coming from the PC is relatively low compared to the noise floor, and then a large amplification is applied in the display controller, the playback audio may sound noisy.