The use of text to speech (TTS) converters is well known. TTS converters have been used to improve access to computer stored information by visually impaired persons, and for uses such as in "interactive voice response" (IVR) systems in which a remotely located user accesses digitally stored information in a database via a telephone.
The present invention is directed to computer applications in which voice access to information stored in a computer is desirable, and more particularly to systems that provide voice access to information that by its very nature tends to include a wide variety of formats and individual writing styles, including conventional and nonstandard abbreviations, acronyms, initialisms, numbers, dates, times and telephone numbers in many formats, fractions, inappropriate spacing, emotion indicators (e.g., smiley faces, asterisks and underlines), and the like.
Present text to speech converters can produce intelligible speech only from text which conforms perfectly to the spelling and grammatical conventions of a language. Even the highest quality text to speech converters cannot read typical electronic mail (e-mail) messages intelligibly. Unlike carefully edited text, e-mail messages frequently contain sloppy, misspelled text with random use of case, spacing, fonts, punctuation, emotion indicators and a preponderance of industry-specific abbreviations and acronyms. In order for text to speech conversion to be useful for such applications, it must implement flexible, sophisticated rules for intelligent interpretation of even the most ill-formed text messages.
The difference between an acronym and an initialism is as follows. An acronym is formed from pronounceable syllables, even it is represents a "made up word," while an initialism is not pronounceable except as a sequence of letters and numbers. Thus, "IBM" and "YMCA" are initialisms, while "NASA" and "UNICEF" are acronyms. Some words, such as "MSDOS" and "PCNET" are a mix of acronym and initialism components.
An application that provides telephone access to the user's electronic mail (e-mail) messages using a standard TTS converter will provide inadequate service to end users because a standard TTS converter (A) will attempt to read portions of the e-mail message that the end user does not want to hear, such as the e-mail address of the sender and recipient and the trail of network nodes through which the message was transmitted, and (B) will not properly handle many of abbreviations, acronyms and initialisms typically found in such messages. Moreover, when something is mispronounced, it confuses the listener and makes it difficult to understand portions of the message that are pronounced correctly.
Other voice access applications which are inadequately handled by standard TTS converters include applications for telephone access to a user's personal telephone and address directory or an organization's telephone and address directory, applications for verbal access to information in a spreadsheet, proofreading, and access to computer stored documents by a blind person, and applications which provide audio feedback in addition to visual display.
It is a fact of life that written text in the English language, as well as written text in most other languages, includes a rather large number of abbreviations, measurement values, times and dates and other symbols that the average well educated person knows how to read, but which do not follow the phonetic pronunciation rules applicable to most standard words in the applicable language. Furthermore, many common abbreviations are ambiguous until the context of the abbreviation is determined. For instance, the string "SF" might mean "San Francisco," "Sioux Falls," or "Santa Fe." Another example is that the text:
the 20 ft wayne jumped at the ft wayne indiana meet. PA1 is 1 ft worth buying? is ft worth ahead? dallas 27, ft worth 13 :-) PA1 i believe dr jones lives on oak dr. PA1 i live at 6e maple st and my mother lives at 6 e st. PA1 The 20 feet Wayne jumped at the Fort Wayne Indiana meet. PA1 Is 1 foot worth buying? Is Fort Worth ahead? Dallas 27, Forth Worth 13, ha ha. PA1 I believe Doctor Jones lives on Oak Drive. PA1 I live at 6 East Maple Street and my mother lives at 6 E Street.
is correctly spoken by the present invention as if it were written:
Identical symbols can have different meanings and pronunciations in different contexts. For instance, quotation marks can be used as the inch unit identifier, and for seconds of time and seconds of angular measurement:
______________________________________ Text Description ______________________________________ "Four score and" Quoted text 3.5" measurement in inch units 27'35" angular or directional measurement ______________________________________
Another large category of troublesome text concerns the differences between acronyms and initialisms. For instance, the term "NASA" is an acronym that is pronounced like a word, while the term "IBM" is an initialism that is pronounced as a sequence of three letters. No product, or literature, known to the inventors discloses a systematic method of distinguishing between acronyms and initialisms and how that should be accomplished for purposes of text to speech conversion.
In summary, the number of examples of text that are problematic for standard TTS converters is large.
It is a primary object of the present invention to provide a system and method for systematically pre-processing text that would not be properly converted into spoken words by a conventional TTS converter so as to produce substituted text that, when processed by a conventional TTS converter, will represent the words that would be spoken by a human reader of the same text.