1. Field of the Invention
The present invention relates to Text-to-Speech Synthesis (TTS), and more particularly, to a method and apparatus for smoothed concatenation of speech units.
2. Description of the Related Art
Speech synthesis is performed using a Corpus-based speech database (hereinafter, referred to as DB or speech DB). Recently, speech synthesis systems perform suitable speech synthesis according to their system specifications, such as, DB size. For example, since large-size speech synthesis systems contain a large size DB, they can perform speech synthesis without pruning speech data. However, every speech synthesis system cannot use a large size DB. In fact, mobile phones, personal digital assistants (PDAs), and the like can only use a small size DB. Hence, these apparatuses focus on how to implement good-quality speech synthesis while using a small size DB.
In a concatenation of two adjacent speech units during speech synthesis, reducing acoustical mismatch is the first thing to be achieved. The following conventional arts deal with this issue.
U.S. Pat. No. 5,490,234, entitled “Waveform Blending Technique for Text-to-Speech System”, relates to systems for determining an optimum concatenation point and performing a smooth concatenation of two adjacent pitches with reference to the concatenation point.
U.S. Patent Application No. 2002/0099547, entitled “Method and Apparatus for Speech Synthesis without Prosody Modification”, relates to speech synthesis suitable for both large-size DB and limited-size DB (namely, from middle- to small-size DB), and more particularly, to a concatenation using a large-size speech DB without a smoothing process.
U.S. Patent Application No. 2002/0143526, entitled “Fast Waveform Synchronization for Concatenation and Timescale Modification of Speech”, relates to limited smoothing performed over one pitch interval, and more particularly, to an adjustment of the concatenating boundary between a left speech unit and a right speech unit without accurate pitch marking.
In a concatenation of two adjacent voiced speech units during speech synthesis, it is important to reduce acoustical mismatch to create a natural speech from an input text and to adaptively perform speech synthesis according to the hardware resources for speech synthesis.