This specification relates to speech and language recognition and understanding systems.
Speech recognizers and synthesizers require systems to convert non-standard words, such as numbers (e.g., 97), dates (e.g., 3/23), times (e.g., 8:50 pm), currency expressions (e.g., $1.50), measure phrases (e.g., 10 kg), abbreviations (e.g., St.) and the like to pronounceable versions (ninety-seven, March twenty-third, etc.). Such a pronounceable version expressed in terms of ordinary words is referred to as a “verbalization.”
For many languages, the non-standard words may be grouped into classes, such as cardinal numbers, dates, times, and so on. Each such class is referred to as a semiotic class, e.g., the semiotic classes of cardinal numbers, dates, times, etc.
The process of converting instances of semiotic classes to verbalizations is referred to as text normalization. This is largely accomplished by hand-built grammars written by native speaker linguists, and require many man-hours for each language. The process of training or writing a selector for selecting a verbalization is also labor intensive and requires a large amount of training data, which, in turn, has a commensurate computer resource requirement.