One purpose of a speakerphone system is to allow a user to conduct a phone call without having to hold a conventional handset. Thus, a speakerphone may allow the user's hands to be free, the user to move freely about the room while participating in the call, and multiple people to participate in the phone call from one location, such as a conference room.
A conventional speakerphone system 100, such as shown in FIG. 1, may include a speakerphone 101 and a far-side phone 102. The speakerphone 101 may include a speakerphone housing 103, having a microphone 104 and a loudspeaker 105 within the speakerphone housing 103 or supported by the speakerphone housing 103. The speakerphone 101 may also include an acoustic echo cancellation (AEC) filter 106 and a non-linear processing (NLP) module 107.
In the context of speakerphone systems, the party speaking and listening through the speakerphone is typically called the near side, while the party calling into the speakerphone is typically called the far side. Hence the far-side party calls in through the far-side phone 102. Additionally, a signal received from the far-side phone 102 propagates through a receive path (Rx-path) and is called an Rx-path signal 108, while the signal received by the microphone 104 propagates through a transmit path (Tx-path) and is called a Tx-path signal 109.
Also, there are two common modes for a conventional speakerphone. In full-duplex mode, the Rx-path and the Tx-path are each fully active, or open, at any given time during the phone call. In half-duplex mode, however, only one of the two paths is open at a time. Thus, for example, if the far-side party is talking, the Rx-path is active and the Tx-path is muted. This helps to avoid echo at the far side. Yet, it also means that the inactive side, which is the side that is not speaking, cannot interrupt the active side, which is the side that is speaking, because the inactive side is muted. Accordingly, the half-duplex mode may lead to an unnatural experience for the parties, making it difficult to hold a conversation.
One fundamental problem of conventional speakerphones is a loudspeaker-to-microphone bypass signal 110 on the near side. This bypass signal 110 is also called the acoustic echo path, and the far-side party may experience the bypass signal 110 as an echo. In other words, the far-side party may hear his or her own voice signal coming back, usually after a short delay.
To overcome this problem, many conventional speakerphones implement an acoustic echo cancellation (AEC) signal-processing algorithm, for example, through the AEC filter 106. In general, the AEC algorithm compares the incoming, receive-path signal 108 with the outgoing, transmit-path signal 109 and then subtracts the incoming signal 108 from the outgoing signal 109. As a result, the processed transmit-path signal contains content from the near side, but not content received from the far side. Accordingly, the acoustic-echo-path signal 110 may be reduced or eliminated.
The NLP module 107 may provide additional suppression of any remaining acoustic echo, particularly of any component of the acoustic echo that is non-linear. This is generally done by destructively removing a portion of the outgoing, transmit-path signal 109, although this may damage the signal.
While AEC algorithms generally work well, one challenge with implementing them is the close proximity of the microphone 104 to the loudspeaker 105 in a conventional speakerphone system 100. That is, as the microphone 104 is positioned closer to the loudspeaker 105, the incoming signal picked up by the microphone 104 becomes louder, or stronger. The desired signal from the party talking at the near side, however, typically originates much farther from the microphone 104 than the loudspeaker 105 is from the microphone 104. Hence, the desired signal presents a significantly quieter, or weaker, signal to the microphone 104 relative to the Rx-path signal 108 rendered by the loudspeaker 105.
The ratio between the desired signal from the party talking at the near side and the acoustic echo path 110 can be quantified as a signal-to-echo ratio. As a conventional rule of thumb, a signal-to-echo ratio of down to about −20 or −25 dB can be managed by a conventional AEC algorithm. This means that the AEC algorithm is effective at canceling the acoustic echo up to about that ratio. For ratios smaller than about −20 or −25 dB, echo cancellation may be much less effective, meaning that the far-side party may perceive an echo, or a partial echo, of that party's own voice because all or some of the acoustic echo may bleed through the AEC filter 106. A high-quality system is one that provides full duplex support and no echo at the far side. At a signal-to-echo ratio of less than about −25 dB, however, that goal generally cannot be achieved.
Additionally, the signal-to-echo ratio can be a significant problem in small speakerphones, where the smaller size means that the loudspeaker 105 must be closer to the microphone 104. Mathematically, halving the distance between the loudspeaker 105 and the microphone 104 results in a 6 dB decrease in the signal-to-echo ratio. For example, if the signal-to-echo ratio is −15 dB at a distance of 30 mm, halving the distance between the loudspeaker 105 and the microphone 104 to 15 mm results in a signal-to-echo ratio of −21 dB. Likewise, doubling the distance between the loudspeaker 105 and the microphone 104 results in a 6 dB increase in the signal-to-echo ratio. Thus, high-quality speakerphone systems tend to be relatively large to provide a favorable distance between the microphone 104 and the loudspeaker 105.
Furthermore, while the main portion of the acoustic echo path signal 110 travels through the air, a portion of the acoustic echo path signal 110 may be conducted structurally, through the coupling between the loudspeaker 105 and the microphone 104. For example, a plastic housing component may rattle at the loudspeaker's frequency, and the rattling may be transmitted through structural conduction to the microphone 104 where it is sensed. Additionally, this structurally conducted sound has a transfer function that is typically non-linear. Conventional speakerphones may address such unwanted structural sound by including suspension mechanisms, such as rubber sleeves or springs, to isolate the mechanical vibration. Those solutions, however, increase the cost and complexity of the speakerphone system. Also, those solutions might not effectively reduce the structural sound at some frequencies.
Embodiments of the invention address these and other issues in the prior art.