1—Field of the Invention
The present invention relates to voice teleconferences, in particular telephone conference calls, set up between at least two groups of remote participants.
It relates more particularly to recognizing the voice of a contributor during a teleconference independently of the content of their contribution in order for the other participants in the teleconference to identify the contributor. In other words, it is a question of answering the following question during a teleconference: “Which participant is speaking or has just spoken among all the participants ?”.
2—Description of the Prior Art
Voice identification, also known as speaker indexing, consisting of recognizing the identity of a speaker in a signal comprising mixed speech channels of several participants is disclosed in the paper “REAL TIME SPEAKER INDEXING BASED ON SUBSPACE METHOD—APPLICATION TO TV NEWS ARTICLES AND DEBATE” by M. Nishida and Y. Ariki, Labs—5th ICSLP, Sydney, Australia, December 1998.
According to the above paper, a reference voice model is created for each speaker beforehand. Then, to recognize the voice of a speaker, for example the voice of a participant in a televised debate, the signal of the mixed channels relating to the voices of the participants is divided periodically into voice sections. The distances between each voice section and the voice models are compared and only the lowest distance is selected, provided that it is higher than a threshold. The speaker corresponding to the shortest distance is therefore identified as the contributor for the voice section concerned.
However, in the configuration described in the aforementioned paper, a contributor can be recognized only in mixed voices coming from only a small number of persons who are physically close together. As the number of participants increases, the performance of this contributor identification method deteriorates.
U.S. Pat. No. 5,668,863 describes a system for recording and reproducing an audio conference where the participants gather at telephones beforehand, with one participant per telephone. The system records beforehand audio data blocks each approximately 4 seconds in duration, in order to be able to identify the speakers and add them to a list of speakers if this has not already been done. To identify a speaker during subsequent reproduction of the audioconference, and not immediately, in real time, during the conference, i.e. to identify a participant who is actually speaking, the system identifies the source of the speech, to be more precise the line interface in the system that serves the telephone of the participant, in order to transmit the spoken name of the participant.
The above audio conference recording and playback system does not distinguish between the voices of several participants grouped at the same telephone or telephone terminal and is used after recording the audioconference, and thus with no identification of contributor during the audioconference.