The modulation of an audio stream indicative of speech data with another audio stream indicative of a periodic tone has been used to create synthetic music and certain sound effects. This modulation technique is usually referred to as vocoding, and the apparatus for vocoding speech is referred to as a vocoder or a phase vocoder. The term vocoding is derived from VOice CODING. Originally, the motivation for the development of the phase vocoder was to reduce the amount of data required for the transmission of speech over telephone lines or other speech signal transmission medium. For that purpose, vocoders extract pitch and voice information in order to time-compress the speech, and a phase vocoder may be considered as a series of bandpass filters, each having a center frequency. Through the bandpass filtering process, the speech signal is reduced to a series of signal segments carrying the center frequencies.
In an old-styled telephone set, the ringing tone that is used to signal an incoming telephone call is usually produced by a ringer repeatedly striking one or two bells. In a mobile phone, the ringing tone is produced by an electronic buzzer, which produces a pitch of a given frequency according to a value in a data stream representative of a series of musical tones. Likewise, in an electronic organizer or a personal digital assistant, such as a Palm Pilot, a beeping sound is used to remind the user of a scheduled event or the completion of a task requested by the user.
U.S. Pat. No. 5,452,354 (Kyronlahti et al.) discloses a ringing tone apparatus wherein subscriber identification information is used to generate the ringing tone. As disclosed in Kyronlahti el al., a ringing tone can be generated based on two or more binary digits of the subscriber identification number such as the mobile station identification number (MSIN), mobile identification number (MIN), etc. For example, if the lowest bits of the identification MSIN are described as a string of 11 binary digits, D10-D9-D8-D7-D6-D5-D4-D3-D2-D1-D0, these string of digits can be used to specify the parameters necessary for generating a ringing tone as follows: D1 and D0 are used to determine the duration of each ringing tone pulse; D3 and D2 are used to determine the frequency of the ringing tone pulses; D5 and D4 are used to determine the pulse number in one pulse sequence; D7 and D6 are used to determine the number of sequences to be repeated in the ringing tone; and D10, D9 and D8 are used to determine the silence period between pulse sequences. While this tone generation method is useful for producing different ringing tones for different subscribers, the ringing tones have no relevance to speech data, synthetic or natural. Japanese patent No. JP05346787 (Nakae Tetsukazu) discloses a method of extracting pitch data from a digital speech signal and generating a digital musical sound according to the pitch data. The digital speech signal and the digital musical sound are conveyed to a vocoder in order to generate a musical sound signal and a voice signal from which an envelope signal is produced. Finally, the sound signal is modulated with the envelope signal in order to add the nuance of a human voice to a musical sound. For most languages, the so-called musical sound, according to the pitch variation, is confined to one or two notes. For example, in a phrase like “I am Bond, James Bond”, there is not much in pitch variation and the resulting musical sound signal may sound like EEE_EE. U.S. Pat. No. 5,826,064 (Loring et al.) discloses a user-configurable earcon event engine, wherein auditory cues are provided responsive to command messages issued by tasks executed on a computer system. As disclosed, the command messages include an index to an earcon data file, which, in turn, includes a reference to an audio file and audio parameter data for manipulating the acoustic parameters of an audio wave. However, the audio wave does not have the content of speech.
It is advantageous and desirable to provide a method and apparatus for modifying a carrier stream indicative of musical tones with a speech signal, wherein a broad range of musical tones can be exploited, regardless of the pitch variation in the speech signal.