The invention relates to the analysis of speech signals and more especially to a process for detecting the pitch frequency of voiced sounds in the speech signal and to a device for implementing this process.
In speech, the voiced sounds are formed of vowels or liquid or voiced consonants and possess very specific spectral properties which are not to be found in the unvoiced sounds formed by breathed consonants. These voiced sounds have generally a greater amplitude than the unvoiced sounds and a very marked periodicity in the speech signal. The value of the frequency corresponding to this periodicity (related to the vibration of the vocal cords) is the pitch frequency situated, depending on the person, between 60 and 300 Hz.
This pitch frequency is a fundamental parameter of speech which is evaluated in most vocoders, the quality of the detection of this frequency having a direct influence on the quality of the speech restored after decoding.
The analysis of the state of the art permits two classes of processes and devices for detecting the pitch frequency to be distinguished:
The first proceed by systematic analysis of the speech signal, spectrum analysis or self-correlation, and use generally a volume of calculations which is too great to lead to real-time realizations by means of relatively simple systems.
The second, of a time type, try to locate a periodicity directly in the time signal. They generally use a reduced set of data, for example the time intervals between zero crossovers (or between maximums of the signal), or counting the zero crossovers of the signal during a given time; the criteria of decision take into account the properties observed in the speech signals. The calculations are more reduced for this type of detection, but the corresponding detection devices do not perform very well in the presence of noise and during the voiced signal--unvoiced signal transitions. A process and a device for detecting the melody period using, as set of data, the measurements of the energy in the successive arches of the speech signal has also been described. This device benefits, with respect to the more current time-type devices, from a better immunity against noise and a more selective voicing criterion which limits false detections. However, the detection requires the signal to be chopped into frames of fixed length, the calculations for recognizing a voiced sound only being able to be effected with a lag of a frame. Furthermore, there exists a risk of detecting the double frequency of the pitch frequency for the criterion for avoiding such detection is only effective in the middle of a voiced segment. Finally, the chopping of the signal into frames of fixed lengths which are not related to the contents of the speech signal adversely affects the quality of the measurement, in particular during voiced signal--unvoiced signal transitions.