With the spread of small recording apparatuses including IC recorders, an opportunity to record voices of a plurality of speakers who irregularly alternate one after another such as in a meeting and in a round-table discussion is increasing. To efficiently utilize recorded voice data, a technique that identifies who made an utterance and when the utterance was made in voice data has been developed (For example, Patent Literature 1: Unexamined Japanese Patent Application Kokai Publication No. 2004-145161). This technique is called Speaker Diarization.
A technique that Patent Literature 1 discloses identifies a speaker by comparing a feature quantity in a voice section of recorded data and a feature quantity of prerecorded voices of the speaker.
In the technique of Patent Literature 1, to identify a speaker, a feature quantity of voices of a subject speaker needs to be recorded in advance. In other words, an unknown speaker without registration cannot be a processing object.
The present disclosure was devised in consideration of the above-problem, and aims to provide a voice processing device, a voice processing method, and a program, which easily carry out speaker diarization without prior registration of a speaker.