Audio speech recognition provides a convenient interface for users to communicate with machines. Customer facing businesses use speech recognition to interface with their phone calling customers. Audio content identification is a useful tool to manage and access content and entertainment. Increasingly humans are interacting with machines (also computers or robots) and the audio interface is a favorite. Human interaction with computers and remote machines can benefit from improved speech identification. Applications related to social networking, entertainment and advertising can take advantage of identification of the precise program and the program's exact time as it is played on the consumer device and enable useful solutions for the user, which benefit advertisers and content owners as well.
Robust audio identification in presence of significant ambient and interfering sounds; and tracking of identified content enables the providers to bring various applications directly to smart phones and mobile devices such as tablets. These applications enable widespread use, and take advantage of the network and media convergence.
Interfering sounds from the environment, and interfering speakers reduce accuracy and increase cost to identify content in the captured audio, or to perform speech recognition. For example, with a 0 dB level sound interference, accuracy can reduce by 20% while cost for identification can increase 5 times; compared to a query with 0 dB noise interference only. And with 6 dB speech interference, the accuracy can reduce 30% and cost for identification increases by 10 times; compared to a query with 0 dB noise interference only. Hence it is very desirable to reduce or cancel out the impact of interference.
The effects of distortion and interference are more damaging to speech recognition. Although speech recognition products have made great progress; the accuracy currently deteriorates significantly below 10 dB SNR.