Fraud and other malicious solicitations using telephones with an aim to defraud people of money have become a social problem in recent years. To address this, techniques have been proposed for estimating a speaker's state of mind by analyzing the speaker's voice during conversations such as voice in telephone conversations (see Japanese Laid-open Patent Publication No. 2011-242755, for example).
The techniques are based on analysis of a voice signal including the voice of one speaker whose state of mind is to be analyzed. However, a voice signal in a recorded conversation includes the voices of two or more speakers. In order to accurately estimate the state of mind of a particular speaker on the basis of a voice signal in a recorded conversation, an utterance period of the speaker whose state of mind is to be estimated needs to be identified from the voice signal. For this purpose, speaker indexing techniques have been proposed that can assign identification information identifying each speaker to a period in which the speaker has spoken in a monaural voice signal including voices of a plurality of speakers (for example, Japanese Laid-open Patent Publication No. 2008-175955 and D. Liu and F. Kubala, “Online speaker clustering”, In Proceedings of ICASSP 2004, vol. 1, pp. 333-337, 2004 (hereinafter referred to as Non-Patent Literature 1)).