1. Field of the Invention
The present invention relates to a voice processing apparatus, a voice processing system, and a voice processing program for processing voices collected in environments such as conference rooms where a plurality of persons speak to suppress the influence of echo and howling.
2. Description of the Related Art
In order to allow a conference held between distant places to proceed smoothly, for example, a video conference system has been installed in the different conference rooms (hereinafter referred to as first and second conference rooms) to allow people to speak to each other with their appearances displayed. Such a video conference system (hereinafter also referred to as “sound-reinforced communication system”) includes a plurality of image/voice processing apparatus for displaying situations in the different conference rooms to allow the participants to have views of each other and for emitting sounds representing the contents of speech of the participants. In the following description, it is assumed that an image/voice processing apparatus is provided in each of the first and second conference rooms.
The image/voice processing apparatus includes a microphone collecting a voice during the conference, a camera imaging the participant, a signal processing section performing a predetermined process on the voice of the participant collected by the microphone, a display section displaying a view of the participant speaking in the different conference room, and a speaker emitting a sound representing the contents of speech of the participant.
The image/voice processing apparatus provided in the conference rooms are connected to each other through a communication network. The apparatus record image/voice data and transmit and receive the data to and from each other to display the situation in each conference room and to emit sounds representing the contents of speech of the participants.
In such a video conference system, a sound emitted by a speaker is reflected by a wall or the like and input to a microphone. When no processing is carried out on such an input sound, the sound data is transmitted to the image/voice processing apparatus again. As a result, a person in the second conference room may encounter a phenomenon in which the person hears his or her voice from the speaker with some delay just like an echo. Such a phenomenon is referred to as “echo”. When there is significant echo, a sound emitted by a speaker is repeatedly input to a microphone, and the sound is therefore looped through the sound-reinforced communication system to cause howling.
A technique referred to as echo cancellation has been used to prevent echo and howling. In general, an echo canceller first measures impulse response between a speaker and a microphone using an adaptive filter. When sounds are input from the speaker to the microphone, the echo canceller generates a pseudo echo by convolving the impulse response with a reference signal generated by the speaker. The pseudo echo is subtracted from the sounds input to the microphone. Unnecessary sounds which can cause echo or howling can be eliminated by subtracting the pseudo echo as thus described.
JP-A-2003-271167 (Patent Document 1) discloses a technique for segregating a stereo signal, which is a mixture of signals collected in different channels, into signals in the original channels with a low signal-to-noise ratio and a small amount of calculations.