1. Field of the Invention
This invention relates to an audio system, more specifically to microphone and loudspeaker systems which enable echo cancellation, wireless connection and data compression. The audio system of the present invention is useful in any system that utilizes audio such as telephone, video conferencing, PA systems, sound systems through computer communication etc.
2. Description of the Related Art
Teleconferencing has long been an essential tool for communication in business, government and educational institutions. There are many types of teleconferencing equipment based on many characterizations. One type of teleconferencing unit is a video conference unit, which transmits real-time video images as well as real-time audio signals. A video conferencing unit typically comprises a video processing component and an audio processing component. The video processing component may include a camera to pick up live images of conference participants and a video display for showing real-time video images of conference participants or images of documents. The audio portion of a video conferencing unit typically includes one or more microphones to pick up voice signals of conference participants, and loudspeakers to reproduce voices of the participants at the far end. There are audio-only conferencing systems also, and these are often configured in a similar manner. There are many ways to connect video and/or audio conferencing units. At the low end the link may be an analog plain old telephone service (POTS) line. It may be a digital service line such as an integrated service digital network (ISDN) line or a digital interface to PBX which may use a T1 or PRI line. More recently video conference units and speakerphones may be linked by digital networks using the Internet Protocol (IP), including the Internet. Satellite, cellular and other wireless communication protocols may also be used.
A teleconference unit typically has one or more loudspeakers for reproducing voices of participants at a far-end site and one or more microphones for picking up voices of participants at the near end site. To make more life-like conference, there may be multiple loudspeakers reproducing one or more audio channels. In a larger conference room, there may also be multiple microphones in order to pick up speech of participants seating around the conference room. Wired microphones or loudspeakers are unsightly and frequently cause wire tangling problems. Wireless microphones or loudspeakers, which can eliminate the connecting wires, are preferred.
Audio conferencing is commonly described as being either half-duplex, or full-duplex. In a half-duplex system, only one side can speak at a time. While speaking, the other side is blocked out. These systems are easier to build than full duplex systems, but result in unnatural conversations. In a full duplex system, both sides can speak at once. In order for this to be possible, such a system requires some method of keeping loudspeaker audio from being sent with the audio signal being picked up by the microphone. A common way of achieving this is by use of an echo canceller, more particularly an acoustic echo canceller or AEC. A typical full duplex audio system with a single audio channel is illustrated in FIG. 1. The system 100 has one microphone 12 and one loudspeaker 52. The microphone 12 generates an audio signal 62 and sends it to an Audio Echo Canceller (AEC) 22. The AEC 22 had two types of input signals and one output signal. The first input signal is the microphone signal. The second input signal is the loudspeaker signal. As shown in FIG. 1, the second input to the AEC 22 is a loudspeaker signal 76 which is the same signal 74 intended to be fed to the loudspeaker 52 as signal 72. The audio signal 62 coming from microphone 12 contains not only the desired audio signal from a target source, for example, a teleconference participant's speech or the sound of music played by a musician, it also contains the sound (feedback) from loudspeaker 52 and from room reflections of the loudspeaker sound. Assuming the amplification system of the audio system produces high-fidelity sound through the loudspeaker 52, the feedback picked up by the microphone 12 should be identical to the input of acoustic signal 76 to the AEC 22 plus room reflections of the loudspeaker sound. Therefore, AEC 22 can subtract the feedback due to loudspeaker 52 from the signal 62 so that a substantially echoless signal 64 leaves AEC 22 and feeds into system interface 30. The interface 30 is connected to the rest of the audio system 100 through two signals lines 64 and 74. The signal 64 is an audio output signal, which is the substantially echoless microphone signal. The signal 74 is an audio input signal, which is a loudspeaker signal.
FIG. 2 shows another audio system 200 which is similar to the audio system 100 illustrated in FIG. 1 except that audio system 200 has multiple microphones, 212, 214 and 216, each of which has an independent AEC 222, 224 and 226. With multiple microphones 212, 214 and 216, speeches or sounds for different talkers or participants in the conference can be more accurately or uniformly received by the audio system 200. Different talkers need not take turns speaking into a single microphone in order to be heard. Each AEC 222, 224 or 226 operates exactly the same way as the AEC 22 shown in FIG. 1. Each AEC still has a microphone input signal, one loudspeaker input signal and one output signal. The microphone input signals to the AECs are from different microphones and are different. But the loudspeaker input signals 279, 278 and 276 to each AEC 222, 224 and 226 are the same. These signals are all coming from the same loudspeaker signal 274, which is also sent to the loudspeaker 252 as loudspeaker signal 272. The output signals 264, 263 and 265 from AECs 222, 224 and 226 are fed to a mixer 240. The mixer 240 combines the multiple microphone signals 264, 263 and 265 into a single microphone signal 266, which is sent to the interface 230. The interface 230 is connected to the rest of system 200 through signal lines 266 and 274. In system 200 all connections are wired.
FIG. 3 shows an audio system 300, which comprises a microphone module 310, a loudspeaker module 350, a base station 320 and an interface 330. In this system 300, the microphone module 310 has a microphone 312 and a transmitter 332. The microphone 312 generates audio signal 362 and feeds audio signal 362 to transmitter 332. The base station 320 has a receiver 334, a transmitter 338 and an AEC 322. The base station 320 is coupled to the interface 330 through two signal lines 364 and 374. Receiver 334 regenerates the audio signal 362 as an audio signal 368, which is the microphone input to AEC 322. On the loudspeaker side, a loudspeaker signal 374 from the interface 330 is split into two paths. One goes into AEC 322 as the loudspeaker input signal 379. Another signal 378 is fed into a transmitter 338. The loudspeaker receiver 336 receives the radio signal and regenerates loudspeaker audio signal 372 and feeds it into the loudspeaker 352 to be reproduced. In this system 300, the wireless connections between receivers and transmitters are essentially lossless, either through a high fidelity analog system or through a digital wireless connection. Therefore the audio signal 368 is the same as the audio signal 362 generated by the microphone 312, and the loudspeaker signal 379 is the same as the loudspeaker signal 372 which feeds into the loudspeaker 352. Therefore, the AEC 322 works essentially the same way as the AEC 22 as shown in FIG. 1. The major benefit of this system is that the microphone module 310 and the loudspeaker module 350 are wirelessly connected to the base station 320 and the interface component 330. Therefore, the microphone module 320 and the loudspeaker 350 may be placed in any location within the radio range of the base station 320 of the conference room or the lecture hall. Further, for the AEC 322 to work properly the audio signals 368 and 379 must be of high quality. This requirement, in turn, demands high bandwidth between the transmitters 334 and 338 and receivers 332 and 336 for the microphone module 310, the loudspeaker module 350 and the base station 320. The demand of high bandwidth limits the number of microphone modules that can be used in this system.
To reduce bandwidth requirement, signals may be compressed before transmission and decompressed after reception. In typical signal processing, the compression and decompression are also called encoding and decoding, which is the function of a “codec.” These are broadly classed as lossy and lossless. A lossless codec is one that can perfectly reproduce at its output what was put into its input. A lossy codec is one in which the output is slightly different. The art of the codec is to get as much compression as possible (fewest bits to the channel) while meeting the other goals of the codec. In reference to audio signal processing, lossy codecs work by exploiting weaknesses of the human ear, introducing distortions that the ear cannot detect. While the ear does not detect these changes in a good codec, such as MP3, an AEC will. A lossy codec can achieve much more compression (four to sixteen times compression, typically) than a lossless codec (two to three times compression), which is why lossy codecs are more frequently used. However, when a codec is used in an audio signal processing system, the AEC in the same system does not work properly. In order to have the audio system to work, one may have to disable the AEC within the system.
Depending on the configuration of the interfaces 30, 230 or 330, the systems 100, 200 or 300 may be used in various applications, such as an audio amplification system or a site of a teleconferencing system.
Some interfaces used in systems shown in FIGS. 1-3 are illustrated in more details in FIGS. 10a and 10b. FIG. 10a shows a simple interface 1030, which may be a direct connection between the microphone signal 1024 and the loudspeaker signal 1034. With this type of interface, the audio system 100 as shown in FIG. 1 is essentially a simple audio amplification system. The interface 1030 may also perform simple signal processing, such as pre-amplification, buffering etc.
FIG. 10b shows a more complicated interface 1035 which is typically used in a teleconferencing system. The interface 1035 may contain several components or network interfaces, for example, a plain old telephone service (POTS) interface 1045, a digital IP interface 1055, an audio interface 1075 and processor 1065. These components are logical components. Their functions may be performed by one or more circuitries, such as IC chips. The audio interface 1075 is coupled to the rest of an audio system with one audio input line 1044 and one audio output line 1048. The processor 1065 is used to process microphone signals, loudspeaker signals or other intermediate signals. Depending on the network employed in the conference system, other network interfaces may be installed in the interface 1035, instead of or in addition to the POTS interface 1045 or IP interfaces 1055.
In one example, where an IP network is used, a microphone signal 1044 is fed into the processor 1065 through audio interface 1075. The microphone signal 1044 is processed and/or converted to a proper format, then sent out as signal 1054 via the digital interface 1055 to a far end site of the teleconferencing system. Similarly, an audio signal 1058 from the far end site of the teleconferencing is received by the digital interface 1055 and is fed into the processor 1065. It is processed and/or converted into a loudspeaker signal 1048. The loudspeaker signal 1048 is fed via audio interface 1075 into the audio system of the near site of the teleconferencing system, for example the system 100, as shown in FIG. 1. If the teleconference system is connected through POTS network, then the POTS interface 1045 is utilized instead of IP interface 1055.
In a typical teleconferencing system among many audio systems, full duplex operation is a necessity. In a full duplex system, such as a speakerphone system, or an audio system used in lecture halls or theaters, it is desired to have one or more microphones connected by remote links, such as wireless connections. In these situations, there are many constraints. These include, for example, the available data bandwidth between the microphones and the network interface, aggregated data bandwidths available to all microphones together, the high signal quality required by AEC, battery life in the wireless accessories, the desire for wide audio bandwidths, and system cost. In order to meet these goals, it is desirable to compress the audio in the wireless unit prior to transmission, because fewer transmitted bits result in lower power consumption and lower usage of data bandwidth. Unfortunately, compression causes distortion. Audio compressors are optimized to hide this distortion from the ear, but distortion renders echo cancellation, including AEC, inoperable because as noted above, the AEC requires an accurate representation of the microphone and loudspeaker signals.
In a full duplex audio system, an AEC is a necessity. As discussed above, the active AEC can subtract room echoes due to the loudspeakers in the conference room from the microphone signals. This requires an accurate representation of microphone signals which is often not available in a remote link.
It is desirable to overcome the constraints discussed above.