Speaker clustering is a method which recognizes speeches of a plurality of human speakers. Speaker clustering is often implemented in an apparatus which supports drawing up minutes of a conference.
Some speaker clustering methods try to recognize the speeches of the plurality of speakers accurately based on directions of the speakers and acoustic features of acquired speeches. The directions of the speakers are estimated by using a microphone array including a plurality of microphones.
One speaker clustering method using a microphone array operates to separate a speech to a plurality of clusters based on a direction of arrival estimation within a limit period from a previous time, to build speech models from the speeches in the same cluster, and to recognize a presently acquired speech by using built speech models.
However, such a speaker clustering method cannot accurately recognize speakers due to an accuracy of estimation of a direction of arrival of a speech and a position of a speaker, etc.