1. Field of Invention
This invention is directed to a multimedia search apparatus and methods for searching multimedia content using speaker detection to segment the multimedia content.
2. Description of Related Art
In one known method for speaker identification and verification, Gaussian Mixture Models (GMMS) are used to model the spectral shapes of the speaker""s voice. This method is described in xe2x80x9cRobust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,xe2x80x9d Douglas A. Reynolds, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, January 1995 (Reynolds), which is incorporated herein by reference. This method uses Gaussian Mixture Models to verify the identity of a speaker such as when conducting financial transactions. However, the above-described speaker identification and verification method assumes that only one speaker is the source of the audio input for all samples. Thus, this method is only practical for identifying a single speaker. Therefore, there is a need for new technology to provide more reliable speaker detection when more than one speaker may be present in multimedia information.
This invention provides multimedia search apparatus and methods for searching multimedia content using speaker detection to segment the multimedia content. The multimedia search apparatus and methods may aid in browsing multimedia content and may be used in conjunction with known browsing techniques such as word spotting, topic spotting, image classification, and the like.
The multimedia search apparatus receives a search request from a user device. The search request includes information regarding the target speaker for which the search is to be conducted. Based on the search request, the multimedia search apparatus retrieves the multimedia content from a multimedia database.
In one embodiment of the invention, the multimedia search apparatus retrieves Gaussian Mixture Models (GMMs) from a Gaussian Mixture Model storage device, corresponding to the target speaker and background data. Based on the retrieved Gaussian Mixture Models, the multimedia search device searches the multimedia data of the multimedia content and segments the multimedia data. The segments are identified by determining an average normalized score for blocks of frames of the multimedia data and determining if the average normalized score exceeds one or more predetermined thresholds. If the average normalized score exceeds the one or more thresholds, the frame may be part of a target speaker segment. If the normalized score falls below one or more of the thresholds, the frame may be considered to be in a background segment.
Once the segments are identified by the multimedia search device, the segments may be provided to the user device as results of the search. Accordingly, the user device may choose from the identified multimedia content and multimedia segments for playback.