In a sound recognition system, an acoustic model or sound model is often used as a statistical representation of sound that makes up each recognition target word or target sound class. For example, the target sound class includes a speech sound class such as a voice command/instruction as well as a non-speech sound class such as characteristic ambient sound of a certain place. In addition, to improve the recognition accuracy, an anti-model, which is a statistical representation of non-target sound classes, i.e., various sound classes other than the target sound class, may be used. In this type of sound recognition system, an input sound to be detected is compared with a sound model of a target sound class to determine a likelihood L(TS) that the input sound corresponds to the target sound class TS. The input sound is further compared with an anti-model associated with the target sound class to determine another likelihood L(˜TS) that the input sound corresponds to non-target sound classes ˜TS. Based on both likelihoods, a final determination is made on whether the input sound belongs to the target sound class. For example, the input sound is highly likely to correspond to the target sound class TS as the likelihood L(TS) becomes greater whereas the likelihood L(˜TS) becomes smaller.
A sound model of a target sound class may be generated based on training sound samples belonging to the target sound class. For example, a sound model of a speech sound class is generated based on various speech sound samples. Further, an anti-model of the speech sound class may be generated based on sound samples of various non-target sound classes that are distinguishable from the target sound class. For example, the anti-model of the speech sound class is generated based on sound samples of various non-speech sound classes. As a greater number of sound samples are used, more accurate sound models and anti-models may be generated.
Conventionally, sound models and anti-models for use in a sound recognition system are prepared and updated by a developer of sound recognition software or applications running on the system. For example, a developer of a sound recognition application for installation on a mobile device also prepares sound models and anti-models dedicated for such application. Preparation and update of the sound models and anti-models are performed based on sound samples collected by the developer. Since the developer is generally more interested in target sound classes for the sound recognition application, the scope of which may be clearly defined, it is relatively easy to collect sound samples for preparation of sound models of the target sound classes. Unfortunately, it is difficult for the developer to collect sound samples of various non-target sound classes for preparing and updating anti-models, the scope of which may be broader and rather indefinite compared to the target sound classes. Thus, it is challenging for the developer to generate accurate anti-models by collecting various non-target sound samples.
More than one sound recognition application is often installed in a device to access various sound models and anti-models for detection of respective target sound classes from an input sound. In this operation setting, each application individually operates to detect its target sound classes by accessing a subset of the sound models and anti-models. However, if a plurality of applications is running on the device simultaneously, this may increase the computational load and require more data storage, and thus deteriorate the performance of the device.