Two known audio or speech comparison processes are speaker diarization and speaker search. Given an unlabeled continuous recording of one or more unknown persons speaking, “speaker diarization” (sometimes referred to as “speaker clustering” or “speaker indexing”) refers to the process of segmenting the continuous recording into substantially speech-only segments (termed herein as “utterances”) and labeling which utterances were spoken by the same person. The utterances spoken by the same person are deemed “clusters”. In this manner, a database of recorded conversations can be converted into a database of clusters (or speakers). Given a query or search input utterance and a database of target or potential speakers, “speaker search” refers to the process of determining which speakers in the database most likely (based on a similarity measure) came from the same speaker as the query utterance. Speaker search is conceptually similar to “speaker identification” or “speaker recognition,” but we use the term “search” here to emphasize the fact that matches are made on a very large population of speakers (tens of thousands or more).
Most speaker diarization and speaker search methods include some sort of utterance comparison stage that uses an utterance comparison equation to compare two utterances and output a similarity score that indicates the likelihood that the two utterances were spoken by the same person. Both the speed and accuracy of the speaker diarization and speaker search processes depend in large part on the utterance comparison stage.
As speech comparison becomes more widely used and the number of recorded voice conversations used to populate a database of clusters continues to rise, there exists a need for speaker diarization and speaker search methods that continue to provide acceptable speed and accuracy.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help improve understanding of various embodiments. In addition, the description and drawings do not necessarily require the order illustrated. It will be further appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. Apparatus and method components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the various embodiments so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein. Thus, it will be appreciated that for simplicity and clarity of illustration, common and well-understood elements that are useful or necessary in a commercially feasible embodiment may not be depicted in order to facilitate a less obstructed view of these various embodiments.