Pre-recorded message prompts are widely used in IVR telecommunications applications. Message prompts of this nature provide users with instructions and navigation guidance using natural and rich speech. In many instances it is desired to change the rate at which recorded speech is played back. Playing back speech at different rates poses a challenging problem and many techniques have been considered.
One known technique involves playing recorded messages back at a clock rate that is faster than the clock rate used during recording of the messages. Unfortunately by doing this, the pitch of the played back messages is increased resulting in an undesirable decrease in intelligibility.
Another known technique involves dropping short segments from recorded messages at regular intervals. Unfortunately, this technique introduces distortion in the played back messages and thus, requires complicated methods to smooth adjacent speech segments in the messages to make the messages intelligible.
Time compression can also be used to increase the rate at which recorded speech is played back and many time compression techniques have been considered. One time compression technique involves removing pauses from recorded speech. When this is done, although the resulting played back speech is natural, many users find it exhausting to listen to because of the absence of pauses. It has been found that pauses are necessary for listeners to understand and keep pace with recorded messages.
U.S. Pat. No. 5,341,432 to Suzuki et al. discloses a popular time compression technique commonly referred to as the synchronized overlap add (SOLA) method. During this method, redundant information in recorded speech is detected and removed. Specifically, the beginning of a new speech segment is shifted over the end of the preceding speech segment to find the point of highest cross-correlation (i.e. maximum similarity). The overlapping speech segments are then averaged or smoothed together. Although this method produces good quality speech it is suitable only for use with clearly voiced parts of speech.
Other techniques for changing the playback rate of recorded speech have also been considered. For example, U.S. Pat. No. 6,205,420 to Takagi et al. discloses a method and device for instantly changing the speed of speech data allowing the speed of speech data to be adjusted to suit the user's listening capability. A block data splitter splits the input speech data into blocks having block lengths dependent on respective attributes. A connection data generator generates connection data that is used to connect adjacent blocks of speech data.
U.S. Pat. No. 6,009,386 to Cruikshank et al. discloses a method for changing the playback of speech using sub-band wavelet coding. Digitized speech is transformed into a wavelet coded audio signal. Periodic frames in the wavelet coded audio signal are identified and adjacent periodic frames are dropped.
U.S. Pat. No. 5,493,608 to O'Sullivan et al. discloses a system for adaptively selecting the speaking rate of a given message prompt based on the measured response time of a user. The system selects a message prompt of appropriate speaking rate from a plurality of pre-recorded message prompts that have been recorded at various speaking rates.
U.S. Pat. No. 5,828,994 to Covell et al. discloses a system for compressing speech wherein different portions of speech are classified into three broad categories. Specifically, different portions of speech are classified into pauses; unstressed syllables, words and phrases; and stressed syllables, words and phrases. When a speech signal is compressed, pauses are accelerated the most, unstressed sounds are compressed an intermediate amount and stressed sounds are compressed the least.
Although the above-identified prior art disclose techniques that allow the playback rate of recorded speech to be changed, improvements are desired. It is therefore an object of the present invention to provide a novel apparatus and method for changing the playback rate of recorded speech.