1. Field of the Invention
The present invention relates to a voice processing apparatus for recognizing a command by voice uttered, for example, by a plurality of conference participants and processing the same.
2. Description of the Related Art
A voice processing apparatus having a function of performing voiceprint recognition on human voice and processing the same (voiceprint recognition) has been based on a premise of a microphone connected to a control apparatus, such as a telephone and a personal computer (PC), as an input means of the voice, and a voice processing apparatus provided with such a microphone has been applied for personal identification, etc., for example, in a call center and a network in a financial institution.
However, such a voice processing apparatus of the related art is intended to individual voice recognition as its use environment. Therefore, when using such a voice processing apparatus in a scene of a group work with a plurality of people, for example, in a conference with a plurality of people, a plurality of voices will be erroneously detected and erroneously recognized, so it has not been usable.
Namely, in a state where two or more participants speak at a time, since a plurality of voices are mixed and input to the voice processing apparatus through a microphone, it is impossible to specify a conference participant as a main speaker among a plurality of speakers and to obtain an accurate voiceprint recognition result.
On the other hand, conventionally, conference content is recorded in a recording medium by a recording apparatus, etc. in a conference or other group works, and minutes are written after the conference by confirming each speaker. Although there has been a method of recording the conference content as they are in a recording medium, organizing data of each speaker takes some time and the work is demanding.
Thus, a method of performing personal identification by the above voiceprint recognition technique and arranging speech data for each speaker can be considered. However, in the case where speeches of a plurality of participants overlap in a conference, etc., it is difficult to specify who is speaking, and so personal identification of the speakers is impossible by applying the above voiceprint recognition technique as it is.
Furthermore, there is a disadvantage that attribute data (name and role, etc.) for a speaker cannot be output by real-time processing while the speaker is speaking in a conference. Therefore, participants have to refer to distributed documents, etc. and cannot concentrate on the speech.
Also, a chairperson of a conference has to operate a PC to show data and add an explanation by speaking, which are very demanding and unfavorable in terms of an efficiency of the conference.
Thus, there has been a demand for accurately specifying a main speaker and outputting attribute data of the speaker on a screen or with a voice at a time to notify all conference participants even when a plurality of participants speak at a time.