The present invention relates to systems and methods for voice and text messaging, as well as systems and method for language recognition. More particularly, the present invention is a communications system that automatically identifies a language associated with a text message, and performs an appropriate text-to-speech conversion.
Computer-based techniques for converting text into speech have become well-known in recent years. Via such techniques, textual data is translated to audio information by a text-to-speech conversion xe2x80x9cengine,xe2x80x9d which most commonly comprises software. Examples of text-to-speech software include Apple Computer""s Speech Manager (Apple Computer Corporation, Cupertino, Calif.), and Digital Equipment Corporation""s DECTalk (Digital Equipment Corporation, Cambridge, Mass.). In addition to converting textual data into speech, such software is responsive to user commands for controlling volume, pitch, rate, and other speech-related parameters.
A text-to-speech engine generally comprises a text analyzer, a syntax and context analyzer, and a synthesis module. The text analyzer, in conjunction with the syntax and context analyzer, utilizes a rule-based index to identify fundamental grammatical units within textual data. The fundamental grammatical units are typically word and/or phoneme-based, and the rule-based index is correspondingly referred to as a phoneme library. Those skilled in the art will understand that the phoneme library typically includes a word-based dictionary for the conversion of orthographic data into a phonemic representation. The synthesis module either assembles or generates speech sequences corresponding to the identified fundamental grammatical units, and plays the speech sequences to a listener.
Text-to-speech conversion can be very useful within the context of unified or integrated messaging systems. In such messaging systems, a voice processing server is coupled to an electronic mail system, such that a user""s e-mail in-box provides message notification as well as access to messaging services for e-mail messages, voice messages, and possibly other types of messages such as faxes. An example of a unified messaging system is Octel""s Unified Messenger (Octel Communications Corporation, Milpitas, Calif.). Such systems selectively translate an e-mail message into speech through the use of text-to-speech conversion. A user calling from a remote telephone can therefore readily listen to both voice and e-mail messages. Thus, a unified messaging system employing text-to-speech conversion eliminates the need for a user to have direct access to their computer during message retrieval operations.
In many situations, messaging system users can expect to receive textual messages written in different languages. For example, a person conducting business in Europe might receive e-mail messages written in English, French, or German. To successfully convert text into speech within the context of a particular language requires a text-to-speech engine designed for that language. Thus, to successfully convert French text into spoken French requires a text-to-speech engine designed for the French language, including a French-specific phoneme library. Attempting to convert French text into spoken language through the use of an English text-to-speech engine would likely produce a large amount of unintelligible output.
In the prior art, messaging systems rely upon a human reader to specify a given text-to-speech engine to be used in converting a message into speech. Alternatively, some systems enable a message originator to specify a language identification code that is sent with the message. Both approaches are inefficient and inconvenient. What is needed is a messaging system providing automatic written language identification as a prelude to text-to-speech conversion.
The present invention is a unified messaging system providing automatic language identification for the conversion of textual messages into speech. The unified messaging system comprises a voice gateway server coupled to a computer network and a Private Branch Exchange (PBX). The computer network includes a plurality of computers coupled to a file server, through which computer users identified in an electronic mail (e-mail) directory exchange messages. The voice gateway server facilitates the exchange of messages between computer users and a telephone system, and additionally provides voice messsaging services to subscribers, each of whom is preferably a computer user identified in the e-mail directory.
The voice gateway server preferably comprises a voice board, a network interface unit, a processing unit, a data storage unit, and a memory wherein a set of voice messaging application units; a message buffer; a plurality of text-to-speech engines and corresponding phoneme libraries; a trigraph analyzer; and a set of corecurrence libraries reside. Each voice messaging application unit comprises program instructions for providing voice messaging functions such as call answering, automated attendant, and message store/forward operations to voice messaging subscribers.
A message inquiry unit directs message playback operations. In response to a subscriber""s issuance of a voice message review request, the message inquiry unit plays the subscriber""s voice messages in a conventional manner. In response to a text message review request, the message inquiry unit initiates automatic language identification operations, followed by a text-to-speech conversion performed in accordance with the results of the language identification operations.
The trigraph analyzer examines a text sequence, and performs language identification operations by first determining the occurrence frequencies of sequential 3-character combinations within the text, and then comparing the determined occurrence frequencies with reference occurrence statistics for various languages. The set of reference occurrence statistics associated with a given language are stored together as a corecurrence library. The trigraph analyzer determines a closest match between the determined occurrence frequencies and a particular corecurrence library, and returns a corresponding language identifier and likelihood value to the message inquiry unit.
The message inquiry unit subsequently selects a text-to-speech engine and an associated phoneme library, and initiates the conversion of the text message into computer-generated speech that is played to the subscriber in a conventional manner.