1. Field of the Invention
The present invention relates to text to speech conversion systems for translating words represented by character codes into their spoken equivalents and more specifically to apparatus used to create and modify entries in the dictionaries employed in such systems to store phonemic representations of words.
2. Description of the Prior Art: FIGS. 1-3
In recent years, systems have become available by means of which text inputs may be converted into spoken outputs. Typically, these systems include microprocessors and software for converting the test inputs into a phonemic form and software and hardware for converting the phonemic form into sound waves representing the text string. The technology involved in such systems is explained in Geoff Bristow, editor, Electronic Speech Synthesis, Granada Publishing Ltd., 1984. A commercial example of such a system is the PROSE2000 (TM) text-to-speech converter made by Telesensory Systems, Inc. Operation of the PROSE2000 converter is set out in the PROSE2000 Text-to-Speech Converter User's Manual, Telesensory Systems, Inc., which is hereby incorporated into the present specification by reference.
FIG. 1 is a block diagram of a prior-art text to speech converter. In this figure and the ones that follow, functional components are represented by blocks and the flow of data between the functional components is represented by labelled arrows. Converter 101 receives text input and produces speech wave forms as output. The two main components of converter 101 are text-phonemic converter 103 and phonemic-speech converter 121. As shown by the arrows labelled TEXT and PR, Text-phonemic converter 103 receives the text input and produces from it a phonemic representation (PR) of the test. The phonemic representation contains codes indicating the phonemes for the spoken equivalent of the text. Included in the phonemes are indicators for word divisions, grammatical function, stress, and the pauses and intonation indicated by means of punctuation marks in the text. Phonemic-speech converter 121 receives the phonemic representation and produces therefrom speech waveforms for the spoken equivalent. The wave forms may then be output to audio devices such as amplifiers and loudspeakers. In the discussion and figures, these waveforms are termed speech output.
Of these components, only text-phonemic converter 103 is relevant to the invention disclosed herein, and consequently, internal details of only that component are shown in FIG. 1. In overview, text-phonemic converter 103 works as follows. It first normalizes the text in text normalizer (TN) 105, then determines in word look up (WL) 107 whether each word in the text is one of a set of exceptions to the normal phonetic rules of the language which are listed in dictionary (DICT) 109. If the word is not an exception, it may be converted directly into the corresponding phonemic representations in rule converter (RC) 119; if it is, the phonemic representation is obtained from DICT 109.
Continuing in greater detail, TN 105 is software which receives the text and normalizes it by separating it into words, replacing abbreviations and numbers with their full word equivalents, and deals with punctuation and other non-alphanumeric characters. TN 105 produces two outputs, the normalized text (NT), which goes directly to RC 119, and the words (W) from the text, which go to WL 107. If a word is an exception to the normal phonemic conversion rules, it will have a dictionary entry (DE) 111 in DICT 109. Thus, WL 107 can determine whether a word is an exception by looking it up in DICT 109. If the word is DICT 109, RC 119 obtains some or all of the information it needs to produce from DE 111 for the word; otherwise, RC 119 produces the phonemic representation solely from the normalized text.
DEs 111 are arranged in a fashion permitting quick and efficient search. The contents of a DE 111 are implementation dependent. One version is shown in FIG. 2. There, there are two kinds of DEs 111, stress DE (SDE) 201 and phonemic DE (PDE) 207. Both contain text form (TF) field 203, which contains the normalized text form of the word corresponding to the DE 111. SDE 201 additionally contains only stress information (SI) 205. SDE 201 is used for words whose phonemic representation is regular except for the manner in which they are stressed; PDE 207 is used for all other words whose phonemic representation is irregular. It contains phoneme form (PF) 209 of the word, indicating what phonemes it is made up of.
When WL 107 locates a DE 111 for a word, it indicates to RC 119 whether the DE 111 is a SDE 201. In that case, RC 119 fetches SI 205 and combines it with the phonemic form it derives using its rules to produce the phonemic representation. Otherwise, DE 111 is a PDE 207 and RC 119 used PF 209 to produce the phonemic representation.
In the prior art, a DICT 109 is produced in the manner shown in FIG. 3. First, the person producing DICT 109 uses an ordinary text editor to produce a text source dictionary (SRCDICT) 301 in a text file. SCRDICT 301 contains a number of source dictionary entries (SRCDEs) 303. Each SRCDE 303 contains at least a SRCTF 305, which is a text string representing the word for which the entry is being made, and SRCPF 307, which is a text representation of the phonemes representing the word. The forms and formats of the information in SRCTF 305 and SRCPF 307 are prescribed by the manufacturer of the text to speech converter for which the dictionary is being made. For example, a SRCDE 303 for the word "already" in the PROSE2000 text to speech converter must have SRCTF 305 and SRCPF 307 fields as follows: EQU ALREADY/wLR1eDE/
Once SCRDICT 301 is finished, the user runs a program, DICT MAKER 309, on SRCDICT 301. DICT MAKER 309 is analogous to a compiler and analyzes and compacts the information contained in SRCDICT 301 to produce DICT 109. When DICT 109 is made available to text-to-speech converter 101, the correctness of the phonemic representations in SRCPFs 307 in SRCDICT 301 may be tested by inputting text containing the words to text to speech converter 101 and listening to the results. If any of the words in DICT 109 is not satisfactorily pronounced by converter 101, the user must edit and corresponding SRCDE 303, run DICT MAKER 309 on SRCDICT 301, and again input text to converter 101 to test the result
The above method of producing DICT 109 is difficult and time-consuming and requires special skills for determining the correct phonemic representation, but is adequate as long as DICT 109 rarely, if ever, changse. However, there are many possible applications for text to speech converter 101 in which the exceptions in DICT 109 may change frequently. For example, a person's name is one type of word which is frequently pronounced in a manner which is not completely regular. If a converter 101 is used in an application where it must pronounce names, many of the names will necessarily be included in DICT 109; further, and the names which converter 101 must pronounce are those of a group whose members fluctuate, the names may change frequently. Since it is important in such an application that DICT 109 contain the relevant names, and that converter 101 pronounce them correctly, a skilled and therefore expensive person will frequently need to alter DICT 109 by editing SRCDICT 301, running DICT MAKER 309 on it, and testing the new DICT 109 as just described.
As may be seen from the above discussion of the problems presented by names, what is needed in many potential applications for converter 101 is a means of adding and modifying DEs 111 which is faster, easier to use, and requires less skill than those presently available. The invention described herein provides such a means.