1. Field of the Invention
The present invention generally relates to syllable parsing, and more particularly, it relates to a method and system for converting text into phonetic syllables.
2. Related Art
Many devices currently use computer-generated speech for users' convenience. Automatically generating speech devices range from large computers to small, electronic devices. For example, an automatic telephone answering system, such as voicemail, can interact with a caller through synthesized voice prompts. A computer banking system can report account information via speech. On a smaller scale, a talking clock can announce the time. The use of talking devices is increasingly expanding and will continue to expand as innovation and technology progresses.
Often, for ease-of-use, synthesized speech is generated from text inputted to a speech generating device. These devices receive text, translate it, and output sound in the form of speech through a speaker. However, when translating and reciting the text, these devices do not always speak as clearly and naturally as a human does, therefore synthesized speech is recognizably artificial.
Making a computer or electronic device produce natural sounding speech requires a keen understanding of the nuances of the language and can be difficult for programmers. Computer-generated speech often seems unnatural for a variety of reasons. Some systems pre-record verbal responses in audio files, but when the words are played back in a different order than they were recorded, the response can sound extremely unnatural. One key aspect in the production of natural sounding, computer-generated speech is the ability to recognize boundaries between syllables. The recognition of syllable boundaries allows a speech-generating computer to speak in a more natural manner. The production of more natural sounding synthesized speech would further integrate computers into society and make them seem more user-friendly.
Automatic speech recognition ("ASR") devices perform the reverse function of text-to-speech devices. Computers and other electronic devices are increasingly using ASR as a form of input from a user. ASR applications range from word processing to controlling basic functions of electronic devices, such as automatically dialing a telephone number associated with a spoken name. ASR functions are implemented using computationally intensive programs and algorithms. A thorough understanding of boundaries between syllables in a language also makes the precise recognition of speech easier. Greater understanding of the segmentation of a speech signal improves the recognition of the speech signal.
Accordingly, to improve computer speech production and recognition, it is desirable to provide a system that recognizes syllable boundaries.