In recent years the increased availability and use of the internet has also seen the increased use of IP-based (Internet Protocol) audio and video communication tools, such as VOW (Voice Over Internet Protocol) calling and “webcams,” i.e., cameras for online use. These tools allow users to conveniently talk with each other and have video conferences online, often more economically than traditional land-line and mobile phones, or other video conferencing systems.
Many of the available IP-based audio and video communication systems either use specialized (and often expensive) hardware or require specialized software that must be downloaded and installed on a computing device. However, an increasing number of audio and video communication applications are available that operate entirely within a web browser (such as Internet Explorer from Microsoft, Safari from Apple, Chrome from Google, Firefox from the Mozilla Foundation, etc.) without the need for specialized hardware or software. Since most computing devices can use a browser, this increases the availability of such audio and video communications for many users.
While the wide distribution and convenience of such web browser-only applications make it easier for users to communicate with each other, the use of such applications generally comes with certain audio artifacts and quality problems. The most common and important among these defects is that of “acoustic echo,” which often makes conversation difficult.
An acoustic echo occurs because on most computing devices the built-in speaker and microphone are too close to each other. In a call between parties A and B, when A speaks the speech comes out a speaker on B's device, and is then captured by B's microphone and transmitted back to A and possibly to other computing devices that are participating in the communication session, and is perhaps also amplified by B's microphone. The sound is then similarly played on the speakers of the other parties, and the process may repeat, thus forming a feedback loop and generating a gradually amplifying echo effect.
Prior systems have dealt with this by “echo cancellation,” performed by the specialized hardware or software used for communication. Hardware echo cancellation generally removes the echo from the audio signal, but requires significant computation which is often an expensive operation in software in real time. Accordingly, many software applications do not “cancel” the echo as much as suppress it by simply turning off any signal containing an echo. In either case, this type of echo cancellation or echo suppression is typically done in the user's computing or telephony device.
It is very difficult to suppress an audio echo in browser-only applications. This is due to the fact that unlike many devices which have direct access to the audio and video signals which pass through them, applications that operate through a browser generally have only a very limited ability to manipulate the audio and video data that is captured by a microphone and camera of the computing device, i.e., the application typically merely transmits the audio and video data as encoded by the software for the microphone or camera without modifying the data. This is due to the limitations imposed by the web browser to ensure security when accessing content delivered through the web.
This causes some users to avoid browser-only applications and instead use applications with specialized software installed on the computing device, since some such applications are able to access the audio signal and apply an echo cancellation or suppression algorithm directly on the user's computing device.
A common type of echo suppression is the use of a “half-duplex” approach in which only one participant can transmit audio at any one time, thus preventing the feedback loop. Half-duplex is used widely in many of the analog and digital telephones used in the public switched telephone network (PSTN) as well as in VOIP networks, as well as in many cellular telephones. Even some of the IP-based specialized software applications cause the computing device to act as a “soft phone,” thus using half-duplex as well as other techniques for echo suppression.
However, while half-duplex is very effective for telephone devices, it is difficult to implement in browser-only applications. This is due to the fact that, as stated above, there are limitations on the ability of the browser-based applications to modify the audio data to remove the effect of an echo.
Alternatively, users who still prefer to use the browser-only applications may use other approaches to overcome the difficulties caused by acoustic echo. One such approach is to use a headset that plugs in to the computing device, so that the microphone does not pick up the speaker output, which goes directly to the user's ear. This is inconvenient if there is more than one person at a given location.
Another approach is to ask all participants except the speaker to mute their microphones; however, this too is not only inconvenient when switching speakers, but may still cause echoing if one participant activates a microphone to speak before the current speaker is finished. Alternatively, rather than having to mute the microphone, it is possible to introduce a “push-to-talk” button on the web page to let users control who is talking, similar to the use of a walkie-talkie, although this suffers from the same problems as muting.
For these reasons, the solutions that rely on specialized hardware or software at the user location, or on performing echo cancellation or suppression in the computing device, fail to provide optimal echo cancellation or suppression in browser-based audio or video conferences.