1. Field of the Invention
The present invention relates generally to a conference terminal for a voice conference system, particularly to a conference terminal with echo reduction for a digital voice conference system.
2. Description of the Related Art
Voice conference systems are used anywhere where communication within a group of persons is not possible in a direct way due to too large distances, too high noise level or due to other reasons. Here, it has to be differentiated between systems where the communicating persons are very far apart from each other, so that direct acoustic coupling between the persons is not given, and they can only hear the other conference participants via the voice conference system, and such configurations, where the persons participating in the conference can hear persons both in a direct acoustic way and via the voice conference system due to very close proximity.
The first type of voice conference system corresponds to a teleconference according to the prior art. In a teleconference, delay times occur on the transmission path, which can cause spurious echoes. However, no uniform conference signal exists in these systems, but every participant obtains a specific conference signal, which does not include his own signal. Additionally, there is no direct acoustic coupling of two participants, since they are at totally different locations.
The second type of voice conference system, where persons can hear other conference participants both in a direct acoustic way and via the voice conference system, are, for example, used in meeting rooms, auditoriums, congress halls or others. Here, a plurality of participants is to have the opportunity to provide a voice contribution, and all participants are to hear the voice contributions of the other participants. Here, the contribution to the discussion of an individual participant is not audible in a direct acoustic way in the whole room, but can be heard in a direct way by a person close to the speaker.
A differentiation is also to be made between voice conference systems with wire transmission and systems where the transmission of the voice signal is wireless. Wire systems according to the prior art have the advantage that the transmission capacity of a wire transmission path is very high. In such systems, the audio signals can be transmitted analogously when using high quality cable. However, it is more advantageous to use an uncompressed digital audio signal transmission. The propagation times of the audio signals on such a cable are so short in a spatially limited conference that no audible remote echoes occur. Merely a local echo is compensated to avoid feedback noise in a terminal.
However, nowadays, there is the trend to design voice conference systems for meeting rooms, auditoriums, conference halls or the same in a wireless way. This increases the flexibility in configuring a system and reduces the installation costs significantly, since no further wiring is required apart from a power supply. Otherwise, wireless implementation of a voice conference system allows the realization of a voice conference even when the participating persons are not tied to one location and move, for example, around the room. However, it has to be noted that the technical requirements are significantly higher in a radio based voice conference system than in a wire system.
According to the prior art, mostly analog systems or digital systems without compression are used. The signal propagation time during the transmission of an acoustic signal via such a system is typically low. Particularly, there is only a low delay between the voice signal of a speaking participant and the composite conference signal provided by the voice conference system via a uniform return channel, which contains the contribution of the speaking participant and is output at the loudspeaker of the speaking participant. Due to the low delay, the own voice signal transmitted back to the speaker is not perceived as spurious echo.
However, the radio resources available for a radio-based voice conference system are very limited. In many cases, this requires the digital transmission of the voice signal encoding the same prior to transmission for reducing the data amount and for protection from transmission errors. If the voice conference system uses digital radio transmission and digital audio encoding, a delay of the signal in the two-digit millisecond range results on the transmission path. Further, in contrast to conventional teleconference systems, due to the limited radio resources, it is not possible in wireless voice conference systems that every participant receives a specific conference signal, which does not include his own signal, from the central unit via an exclusive return channel. Rather, in many cases, a uniform conference signal, which is broadcast like a broadcast signal, is provided to all participants. Thus, the uniform conference signal has a significant time delay in a two-digit millisecond range compared to the voice signal of a speaker. It is the consequence of this delay that the speaker perceives a clearly spurious echo of his own comments. This limits currently the applicability of a digital signal transmission in connection with encoding for voice conference systems with delays.
However, currently, there is the tendency to implement voice conference systems with digital wireless transmission.
For a better understanding of the occurring problems, the mode of operation of a voice conference system will now be discussed in more detail. All persons are to be able to participate in the conference via terminals connected in a wireless way. There, all participants can hear the composite signal of all other speaking participants at: all times, via a loudspeaker incorporated in their terminal. The composite signal is formed in a central unit and is constantly transmitted to all terminals. When operating a talk key, the talk signal of the microphone of a participant is transmitted from a terminal to the central unit and there fed into the composite signal.
In such a system, different types of echoes are produced. A local echo (feedback) occurs by acoustic and/or electromagnetic coupling from the loudspeaker to the microphone of a terminal. Such an echo can be reduced with known methods for echo compensation.
Particular problems occur when the propagation time of a transmitted voice signal from one terminal to another terminal is more than about 20 milliseconds. This is particularly the case in digital voice conference systems where a voice signal is transmitted via a transmission channel in an encoded way. During encoding, large delays in a two-digit millisecond range occur. This causes distinct echoes.
The formation of echoes will be discussed below in more detail with reference to FIG. 4 and FIG. 5. FIG. 4 shows a portion of a block diagram of a conventional voice conference system. A first conference participant 10, a second conference participant 12 as well as two associated conference terminals 14 or 16, respectively, are shown. Further, the voice conference system comprises a conference central unit 18. The two conference terminals 14 or 16, respectively, each comprise a microphone 20, 22 as well as a loudspeaker 24, 26. The first conference terminal 14 of the first conference participant 10 as well as the second conference terminal 16 of the second participant 12 are connected to the conference central unit 18 via a bidirectional connection 28 or 30, respectively.
Starting from the structure of a voice conference system, the formation of echoes will be discussed in more detail below. Local echoes, i.e. the feedback from a loudspeaker 24, 26 to a microphone 20, 22 of the same terminal will not be described here, since they can easily be suppressed or reduced, respectively, with known technical measures. Here, first, the echo is considered which a first conference participant 10, who is speaking himself, perceives. Here, it has to be considered that the first conference participant 10 perceives the acoustic voice signal generated by himself directly with his ears. Further, the acoustic signal generated by the speaker follows a signal path designated by 36. The voice signal of the first conference participant is received by the microphone 20 of the first conference terminal 14. Then, the first conference terminal 14 provides the same to the conference central unit 18. There, it is incorporated into the conference composite signal. By digital radio transmission and particularly digital signal processing and encoding and decoding of the digitally transmitted voice signal, a significant delay of the signal in the two-digit millisecond range results on the transmission paths. The voice signal of the first conference participant is then transmitted again from the conference central unit 18 to the first conference terminal 14. There, it is output on the loudspeaker 24 and perceived by the first conference participant 10. Thus, the first conference participant 10 does not only perceive his own acoustic voice signal, but also a echo signal transmitted via the voice conference system, which is heavily delayed in time. Thus, if a signal is transmitted back to its source after conference formation as part of the conference signal, an audible echo perceived as spurious is formed, which is referred to as remote own echo.
Further, it is also possible that the conference terminal 16 of a second conference participant 12 is in talk operation, while a first conference participant 10 is speaking. In this case, a further signal path 38 exists, which causes an echo. The voice signal of the first conference participant 10 is received at the second conference terminal 16 of a second conference participant 12 via the microphone 22 and transmitted to a conference central unit 18 with time delay. There, it is incorporated in a conference composite signal. As part of the conference composite signal, it is then supplied to the first conference terminal 14 of the first conference participant 10 and output at a loudspeaker 24. Thus, the first conference participant 10 perceives a delayed echo of his own voice signal. This effect is also referred to as remote own echo.
A further remote own echo by the simultaneous talk operation of another terminal is avoided by the local echo reduction in the other terminal.
FIG. 5 shows a further portion of a block diagram of a conventional voice conference system. The structure of the voice conference system is identical to the one described with regard to FIG. 4 and will not be described again here. Particularly, the same reference numbers indicate the same means. Here, the interest lies on the examination of signal paths from a first participant 10, here acting as speaker, to a second conference participant 12, here acting as listener. If the first conference participant and the second conference participant 12 are near enough to each other, the second conference participant 12 can hear the voice contribution of the first conference participant 10 on the direct acoustic way 46. Further, a second signal path exists for the voice signal of the first conference participant 10 to the second conference participant 12, here indicated by 48. Here, the voice signal of the first conference participant is transmitted to the conference central unit 18 via the conference terminal 14 of the first conference participant, and from there passed on to the second conference terminal 16 of the second conference participant 12. Thereby, in the case of digital encoded audio signal transmission, a significant delay in the two-digit millisecond range results. Correspondingly, the voice signal received by the voice conference system from the second conference participant via the signal path 48 has a significant delay compared to the voice signal received on the direct acoustic path 46. If the second conference participant hears the voice contribution of another participant both on the direct acoustic path and with time delay via the voice conference system, this is also perceived as spurious echo. This is referred to as remote foreign echo.
Further, it has to be noted that the different types of echoes need to be counteracted with different degrees of difficulty. As has already been mentioned, it is easily possible to reduce a local echo.
A remote own echo, which results due to the fact that the own conference terminal receives his own voice signal, passes the same on to the central unit and receives it again from there, cannot be easily suppressed. Thus, in conventional systems, this is only no problem because they use analog or unencoded digital transmission, where hardly noticeable signal delays occur. However, a digital encoded voice signal transmission, where delays occur inevitably, is made more difficult or even impossible by the echo. Such an echo results on the signal path 36 shown in FIG. 4. The propagation time on the signal path 36 is known with sufficient accuracy, since here mainly the known delay times of the encoding and decoding means are introduced. The suppression of an echo is more problematic when the propagation time of the signal or the difference of the propagation times, respectively, on two different transmission paths is not known. This is for example the case in the suppression of a remote foreign echo. The delay time on the direct acoustic path 46 is not known, while the delay on the signal path 48 can be well estimated by the voice conference system. Thus, the propagation time difference between the direct acoustic path 46 and the signal path 48 through the voice conference system is not known. In this case, echo suppression is very difficult to accomplish and is thus not performed in voice conference systems according to the prior art. The same applies for a remote own echo, which occurs due to the fact that the own voice signal is received by the microphone of the conference terminal of a neighboring conference participant in talk operation, and is distributed in the voice conference system.
Rather, in voice conference systems of the prior art, the conference participants are asked to speak quietly to reduce a remote foreign echo. Thus, every conference participant is to speak so quietly that a neighboring participant can only hardly perceive him on the direct acoustic path, and communication even between neighboring participants takes place substantially via the voice conference system. However, such a measure is not satisfying, since it does not correspond to the natural way of expression of the speakers. Thus, it is very awkward for the speaker to use a conventional voice conference system, particularly when neighboring conference participants are very close to each other.