This invention relates to the speaker diarization. In particular, the invention relates to compensation of intra-speaker variability in speaker diarization.
Speaker diarization is the process of segmenting and labelling audio input according to speakers' identities. A speaker diarization system usually consists of a speech/non-speech segmentation component, a speaker segmentation component, and a speaker clustering component.
Speaker segmentation is the process of identifying change points in an audio input where the identity of the speaker changes. Speaker segmentation is usually done by modeling a speaker with a multivariate normal distribution or with a Gaussian mixture model (GMM) and assuming frame independence. Deciding whether two consecutive segments share the same speaker identity is usually done by applying a Bayesian motivated approach such as Generalized Likelihood Ratio (GLR) or Bayesian Information Criterion (BIC).
Speaker clustering is the process of clustering segments according to speakers' identity. Speaker clustering is usually based on either the BIC criterion or on Cross Likelihood Ratio (CLR).
Intra-speaker variability is the variation of characteristics in a single speaker's output. Compensating for intra-speaker variability can enable more accurate speaker segmentation and clustering.