One of the forefronts of computing technology is speech recognition, because people often find speech to be a familiar and convenient way to communicate information. With computerized applications controlling many aspects of daily activities from word processing to controlling appliances, providing speech recognition based interfaces for such applications is a high priority of research and development for many companies. Even web site operators and other content providers are deploying voice driven interfaces for allowing users to browse their content. The voice interfaces commonly include “grammars” that define valid utterances (words, terms, phrases, etc.) that can occur at a given state within an application's execution. The grammars are fed to a speech recognition system and used to interpret the user's voice entry.
During the development process of speech recognition and in some cases during the actual speech recognition process text needs to be converted from display form (e.g. 11) to spoken form (e.g. eleven) and vice versa. A text-to-speech (TTS) system converts normal language text into speech. In order to perform this task, conventional TTS systems need a process to convert symbols like digits “1” or brackets “(” into an appropriate spoken form. They also need to identify larger constructs such as a date written for example as “5/14/2000” as such and convert into a spoken form that is generally accepted by users such as “may fourteenth two thousand”.
One way of converting text from display form to spoken form is achieved through a process called text normalization, which uses a rule based system to convert from one space to another. In order to provide these rules, a linguist with a high level of knowledge about the language is typically required, as well as technical knowledge about how to structure rules. This has historically caused a very high cost to the authoring process (time, resource, and financial costs). Furthermore, a spoken form (e.g. eleven) needs to be converted back to semantic properties (i.e. the value 11) for the speech application to take the appropriate action. Rules for common cases like dates, times, phone numbers are defined in a grammar library. The grammar library and normalization maps typically cover a shared set of areas like dates, but are normally authored separately, which leads to duplication of effort and differences in coverage.