1. Technical Field
The present disclosure relates to searching media content and more specifically to generating improved speech recognition grammars for searching media content.
2. Introduction
Current technology allows for massive media libraries of thousands of videos, movies, songs, and other media on even a modest personal computer. In an online, on-demand video application, the amount of media can be exponentially higher. With such large quantities of media, traditional user interfaces are inadequate. Natural language interfaces provide one way for users to quickly locate a particular piece of media or a particular group of media. For example, a user can search for a particular movie by saying “I want to see that one war movie with Tom Hanks” or “What is the movie series that has ‘Clear and Present Danger’?”
However, one approach known in the art typically train speech recognition systems based on large amounts of training data related to the media in the media library. One problem with this approach is that before wide-scale deployment of such systems, insufficient data is available for training. Another problem with this approach is that as new actors, movie titles, song titles, directors, genres, etc. trickle in to the vernacular of the searchers, the trained speech recognition model is ill-equipped to handle these new terms and terms associated with the new items are not appropriately represented and will appear to be less popular. For example, many users search for a new, previously unknown actress who appears in a recent movie, but who has a name similar to that of another, more established actress. The prior art speech recognition model would be trained to recognize the more established actress, when in fact users are really saying the new actress's name. The rapid pace of introducing new actors, movies, songs, and other media only exacerbates this problem.