1. Field of the Invention
This invention relates to the field of phonetics. In particular, the invention relates to technologies for transforming pronunciations appropriate for American English into pronunciations appropriate for British English.
2. Description of the Related Art
A. Notation
Before turning to definitions, some notational concerns will be addressed. A standard notational alphabet, the International Phonetic Alphabet (IPA) can be used to represent the pronunciation of words using phonemes. However, the IPA uses symbols that are difficult to represent easily in ASCII systems and further many of the symbols lack appropriate representational glyphs in standard computer fonts. (Newer systems that handle Unicode can represent IPA symbols directly and frequently include newer fonts with appropriate glyphs for IPA symbols.) Accordingly, it is more convenient and has become industry standard practice to use the Computer Phonetic Alphabet (CPA) in computer speech recognition and pronunciation generation tools such as xe2x80x9cautopronxe2x80x9d, from Nuance Communications, Menlo Park, Calif. and xe2x80x9cnameproxe2x80x9d, from E-Speech Corporation, Princeton, N.J.
The CPA has the advantage that it can be represented using standard ASCII characters using the glyphs in commonly available fonts. The following tables show the correspondence between CPA and IPA symbols for American English and British English.
Throughout the remainder of this document, the CPA symbols will be used to represent phonemes in transcriptions. When relevant, transcriptions written in CPA symbols will be identified as corresponding to British English (UK) or American English (US) if it is not clear from the context and it is relevant to understanding the material. Additionally, to minimize confusion, US English conventions for spelling and style will be used throughout the body of this specification, except in examples and rules. Additionally, the UK CPA forms are used for Australian and New Zealand pronunciations.
The range of possible sounds that a human being can produce by moving the lips, tongue, and other speech organs, are called phones. These sounds are generally grouped into logically related groups, each a phoneme. In a given language only certain sounds are distinguished (or distinguishable) by speakers of the language, i.e. they conceptualize them as different sounds. These distinguishable sounds are phonemes. In fact, a phoneme may be defined as a group of related phones that are regarded as the same sound by speakers. The different sounds that are part of the same phoneme are called allophones (or allophonic variants).
Returning to notation issues, the phonemic transcription of a word will be shown between slashes (xe2x80x9c/ /xe2x80x9d). For clarity, the glyph xe2x80x9cxc2x7xe2x80x9d will be placed between each phoneme in the transcription, e.g./kxc2x7Oxc2x7rxc2x7nxc2x7*r/ for xe2x80x9ccornerxe2x80x9d (US), to represent the space character visibly. In many computer programs a space character is used to represent the boundary between phonemes; however, in a printed publication using the standard glyph for the space character, xe2x80x9c xe2x80x9d, might lead to ambiguities, e.g. between /*r/ and /*xc2x7r/ (US), etc.
If used, phonetic transcriptions will be shown in brackets (xe2x80x9c[ ]xe2x80x9d). Phonetic transcriptions distinguish between the different phones that are allophones of the phoneme.
B. Role of Phonemic Transcriptions in Speech Software
Speech recognizers (both speaker independent and speaker dependent varieties) rely on pronunciations to perform recognition. For example, in order for the Nuance(trademark) speech recognition software from Nuance Communications, to recognize a word in a recognition grammar, a pronunciation (e.g. phonemic transcription) must be available. To support recognition, Nuance provides a large phonemic dictionary that includes pronunciations for many American English words. The content of the dictionary typically excludes proper nouns and made up words, e.g. xe2x80x9cKodakxe2x80x9d; however, there may be extensions for particular purposes, e.g. for US equity issues (stocks).
Additionally, Nuance provides an automated tool, xe2x80x9cautopronxe2x80x9d, that attempts to generate (simply from the spelling of the word) a usable pronunciation. Other companies, e.g. E-Speech, specialize in providing software that they claim can do a better job at generating such pronunciations.
Symmetrically, a good pronunciation is also important to producing good synthesized speech (or in the case where a human is reading a script, providing the human with extra guidance about the correct pronunciation). Thus, a useful phonemic transcription is important to many aspects of computer speech technology.
C. British English and American English
Although American English and British English share a common origin, there are significant differences in grammar (word choice, vocabulary, spelling, etc.), pronunciation, and text normalization (e.g. time formats, data formats, etc.). One can typically purchase an electronic dictionary of British English, e.g. for use in spell checking, or even a phonetic one for use with products such as the Nuance speech recognition system. However, such a pronunciation dictionary assumes that materials have already been prepared in British English form.
For example, given a particular word like xe2x80x9cattorneyxe2x80x9d in a production script for a voice application (e.g. yellow pages), that was prepared for American English speakers there are several problems. First, if presented a list of options, xe2x80x9cattorneyxe2x80x9d will sound awkward to a British native since they expect the term xe2x80x9csolicitorxe2x80x9d (or perhaps if trying to get out of gaol a xe2x80x9cbarristerxe2x80x9d). Similarly, the native British speaker is unlikely to provide the verbal command xe2x80x9cattorneyxe2x80x9d to the speech recognition system. Lastly, even if the British speaker did provide the word xe2x80x9cattorneyxe2x80x9d, the pronunciation will be different from the one used by Americans. This also has an impact on the recording of the program script where prompts for categories such as xe2x80x9cattorneysxe2x80x9d would need to be re-recorded.
These problems may be further exacerbated in the realm of proper nouns, e.g. names and places, as well as made up words, e.g. company names, movie/book titles, etc., where even if a British English dictionary were provided the term would not likely be present.
D. Noting Stress in CPA
Presently, (as seen above in Tables 1 and 2) the CPA does not support the representation of stress within a word. This limits its usefulness (as compared to IPA representations) in designating differences in pronunciation. For example xe2x80x9cadvertisementxe2x80x9d is pronounced in US English with the stress on the penultimate syllable of the word, whereas UK English places the stress on the second syllable of the word. Shifting the stress changes the pronunciation.
Although present generation speech recognition systems (e.g. Nuance) do not make use of stress (see absence of the same from CPA, above) the stress information is essential for a voice talent performing a script and may potentially be useful in enhanced speech recognition.
E. Conclusion
Prior techniques for converting US English to UK English have required humans to perform textual normalization and pronunciation transformations. Accordingly, what is needed is a method and apparatus for automating the transformation.
Prior techniques for representing word pronunciations in ASCII characters have not supported indicating word stress. Accordingly, what is needed is a method and apparatus for indicating word stress in a fashion compatible with both US and UK CPA representations as well a method and apparatus for presenting a version of the pronunciations without word stress to incompatible speech synthesis and recognition systems.
Prior techniques for preparing voice application programs do not easily allow a script initially prepared for US English to be automatically converted to UK English. Accordingly, what is needed is a method and apparatus textually normalizing a document from US English to UK English and for refining a US English phonemic transcription using one or more well defined rules to produce more accurate transcriptions for UK English.
A naive assumption might be made that US and UK English are similar enough to allow a program designed for one market to simply be used in the other. Typically, due to its size, an application might be first prepared for the US market with later use in the UK (and possibly continental Europe where the UK variety of English is used). For a voice application, it is necessary to ensure that UK English pronunciations for all words are available to enable speech recognition and speech generation (both by text-to-speech and human voice talent.)
Accordingly, a method of transforming a voice application program designed for US English speakers to a voice application program for UK English speakers using a computer system is described. Automation allows what would otherwise be a tedious manual process to be highly automated and focuses human intervention (when needed) on correcting very specific points.
In one embodiment, scripts and grammars associated with the voice application program are converted from US-to-UK English. Three primary tasks can be completed in this process: spelling normalization, lexical normalization, and pronunciation conversion (including where appropriate accounting for stress shifts).
The converted script can be generated using the method and apparatus described herein and a preferred pronunciation from such effort inserted into the script in appropriate locations to assist the voice talent. For example, in one embodiment words with stress differences from the US English forms have their pronunciations listed in the script.
Similarly, the method and apparatus can be integrated into a remotely hosted development environment. This can allow developers who would otherwise be unlikely to have the resources and skill to convert their program independently to do so in a highly automated fashion. Additionally, any manual intervention can be focused on answering specific questions: what part of speech is this, etc. Further such questions are much more easily answered by a non-professional (in linguistics/phonetics) than the broader conversion questions.