1. Field of the Invention
The present invention relates to a pitch extraction apparatus for extracting a pitch (i.e., a pitch period, pitch frequency, or pitch time) of an acoustic wave, e.g., an musical instrument sound or a voice.
2. Prior Art
Most acoustic waveforms of musical sounds or voices have a periodically repetitive waveform except for a noise-like acoustic wave such as a voiceless sound, and a change characteristic of its period, i.e., a pitch period serves as an important parameter in acoustic analysis, synthesis, or recognition. For example, in an acoustic analysis/synthesis system, a pitch extraction result extracted by an analysis unit largely influences quality of a sound synthesized by a synthesis unit.
As a method of extracting a pitch period of an acoustic signal waveform, various methods of pitch extraction (e.g., a method of calculating an autocorrelation function on each frame having a time duration almost equal to a pitch period and extracting a pitch period on the basis of the autocorrelation function) are known (e.g., Japanese Patent Laid-Open (Kokai) Sho. No. 23200; W. Hess, "Pitch Determination of Speech Signal", Springer-Verlag Corp., 1983; Fujisaki et al., "A Novel Method for Pitch Extraction of Speech based on Running Analysis of the Waveform", Reference of Society for the Study of Speech, SP86-95; and the like).
The pitch extraction method is performed by calculating the autocorrelation function, which is widely used since the autocorrelation function can be calculated by processing in a time region, and the influence of a phase relationship between a waveform to be analyzed and a frame which is relatively small.
The pitch extraction method is an important theme for musical recognition, and various apparatuses for pitch extraction are already commercially available (e.g., IVL Corp., Pitch Rider series; FairLight Corp., VoiceTracker; Roland Corp., Voice Processor and MIDI Guitar; Casio Corp., MIDI Guitar; and the like). In these pitch extraction apparatuses, pitch information and intensity information obtained by a pitch extraction unit are converted to Note ON/OFF information, pitch bend information, and the like for a MIDI (Musical Instrument Digital Interface), and a MIDI sound source is connected to the output of the apparatus.
In a conventional pitch extraction apparatus, an overtone component and a double-pitch component of a pitch, a harmonic component other than a pitch, and the like cause erroneous extraction, thus posing a problem. In order to prevent such erroneous extraction, a pitch search range is limited (making a great account of smoothness) or an unnecessary frequency component is removed prior to pitch extraction.
However, many conventional pitch extraction apparatuses operate within the pitch range (80 to 300 Hz) of speech (voice). In these apparatuses, a filtering operation is performed prior to pitch extraction to remove unnecessary harmonic components, and a smooth pitch track is then extracted. On the other hand, a musical instrument sound has a pitch range as wide as about 40 to 1200 Hz. If the abovementioned conventional extraction technique is employed, a high-pitch portion cannot be extracted. Therefore, extracting a pitch of the musical instrument sound, a pitch extraction apparatus needs countermeasures against a sound whose pitch abruptly changes and contains a high-pitch sound unlike normal voice.
In a small-amplitude duration included in a signal wave, pitch excitation tends to be unstable, and hence, pitch estimation becomes unstable.
Conventionally, in order to remove an irregular pitch variation and to obtain a smooth pitch track, estimated values for several frames are often buffered to correct the variation. However, since this technique prolongs a response time, it cannot be used in a real-time system. More specifically, when an apparatus is designed with an object that the previous lookup of a pitch (reference to pitch data extracted previously) is never performed, it is important to improve reliability of estimated values at respective timings.
In pitch extraction processing, since discrimination of durations where a pitch structure may or may not be present largely influences the final result, discrimination of a voiced/voiceless sound must be performed. The voiced/voiceless sound discrimination is performed using various feature parameters. For example, a typical technique using a parameter such as a zero-crossing count, a zero-crossing distance, an LPC primary coefficient, or the like is known. The conventional voiced/voiceless sound discrimination is performed in parallel processing besides pitch extraction processing. Therefore, a processing volume is increased, and logic is complicated.
The present invention has been made in consideration of the conventional problems, and has as its first object to provide a pitch extraction apparatus which can more stably extract a pitch of an acoustic wave over a wide range.
It is a second object of the present invention to provide a pitch extraction apparatus which can extract a pitch of an acoustic wave over a wide range in real time.
It is a third object of the present invention to provide a pitch extraction apparatus which can perform voiced/voiceless sound discrimination with a small processing volume and simple logic, and can extract only a pitch of a voiced sound duration using said discrimination result in the case of extracting a pitch from an input acoustic signal in real time.