1. Field of the Invention
The present invention relates to speech recognition and more specifically to automatic pronunciation modeling for speech recognition.
2. Introduction
Accurate pronunciation modeling is an important part of successful voice search applications. A typical voice search application, such as a corporate telephone directory or a yellow pages search, involves speech recognition of a list of named entities such as people, businesses, cities, movies, music, etc. Although speech recognition technology has matured significantly over the past decade, the variations in pronunciation of named entities among different individuals pose a tremendous challenge for speech recognition systems. As a result, most voice search applications depend on expensive human experts to listen to examples of different pronunciations and tune speech recognition systems manually. This process is not only laborious, slow, and expensive, but also impractical due to unavailability of consistent audio data for each named entity. Currently no stochastic methods have been demonstrated to work automatically and successfully.
In addition, although human experts can carefully craft name pronunciations, the resultant baseforms do not necessarily work well for automatic speech recognition systems. What humans recognize well is not necessarily easy for machines to recognize. Accordingly, what is needed in the art is an improved way to generate pronunciation models for use with speech recognition.