The human voice is a complex signal. A number of parameters are used to describe significant characteristics of the voice signal. Among them are the "pitch" or fundamental frequency of the voice signal, "formant" or oral and nasal cavity resonant frequency and amplitude, and voiced/unvoiced time division of the voice signal. A voiced sound is one in which the vocal cords are active and an unvoiced sound is one in which the sound is generated without involvement of the vocal cords.
The voiced portions of the human speech signal are at a higher power level and of longer duration than the unvoiced portions. The voiced portions always have an associated pitch which is the instantaneous vibration frequency of the vocal cords. In voice signal processing it is of overriding importance to know the pitch during voiced portions of the speech signal.
The fundamental frequency, or pitch, of voiced human speech sounds will occur in the range of 80 to 300 Hertz. In general, the lower portion of this range will be male voices while the higher pitch frequencies occur in female and children's voices. Any single individual will have a limited pitch range but will also display a significant pitch variation in the voiced sections of normal speech.
The human ear senses pitch of a sound by the frequency separation of the pitch harmonics. Sound energy at the pitch frequency can be of low amplitude, or even absent, compared with the energy at the pitch harmonic frequencies.
A statistical process for obtaining voice pitch by means of a histogram concept was proposed by M. R. Schroeder (Journal of Accoust. Soc. of America, Volume 43, pp. 829-834) in Jan. 1968. One approach utilizing the concept is shown in Miller's Pat. No. 3,535,454. The new apparatus disclosed herein is considerably different from that claimed by Miller. The prior art revealed by Miller employs a gate structure which blocks signals below noise in individual channels and an envelope detection and gating apparatus which will block desired signals in the presence of noisy envelopes. The inclusion of noisy channels in the histogram generation provides signal enhancement in the presence of high noise inputs since the noise will be essentially decorrelated while any signal component (even if below noise) will contribute to the harmonic peak.
The disclosed apparatus is designed to operate in a high noise environment and is therefore an improvement over the prior art. In such a noisy environment, the prior art does not provide means for obtaining voiced/unvoiced decisions. Harmonic energy measurement provides such a means in that the total energy of the correlated harmonic sum is measured independently of the uncorrelated noise. By well-known threshold comparing or similar techniques, the presence of voiced signal in a high noise environment can be determined, simultaneously with the pitch measurement process. Additonally, this energy measurement provides an indication of pitch signal strength which can be used to normalize signal amplitudes in speech encoding devices. The generation of such a harmonic energy measurement output for noise degraded speech processing is an improvement over the prior art. The disclosed apparatus uses a common digital clock to measure all channel periods, develop all period pulse trains and measure the time of peak-sum occurrence. This approach is an improvement over the prior art wherein variations in simultaneous independent measurements and/or pulse generation can accumulate to degrade accuracy. By referencing all measurements to a common clock signal, a minimization of relative measurement error is achieved.
The Miller patent and publication (Journal of Accoust. Soc. of America, Vol. 43, pp. 1593-1601) discloses a low pass filter within the period translation apparatus with the disclosed purpose of blocking beat frequency effects. Such a filter will require a maximum cutoff frequency below the fundamental frequency of interest. Given that filter criteria, and a lowest measurement of 67 to 70 Hertz as in Miller, a significant amount of low frequency noise could still be passed through the period translators, particularly in the 20 to 60 Hertz region. The disclosed improved device provides additional low pass filtering, down to the minimum compatible with normal speech dynamics. This additional and unobvious constraint leads to significant improvement in performance against noise since only those noise and signal components in the information bandwidth of interest will be passed and the noise components are subsequently decorrelated in the summation process. The digital low pass filters perform the required circuitry function as discussed in the description of the preferred embodiment. In contrast to the Miller system which shows error removal after the summation and analysis of all channels is performed, the disclosed system provides for maximum noise suppression prior to the synchronization and summation of each channel thereby improving the quality of signals upon which peak detection will be performed. Miller points out that in his system, "In addition to the gross-type errors . . . there are also small perturbations of the measured pitch. These run from approximately 2% at O dB S/N." It is just these noise induced errors that the disclosed approach addresses by (1) reducing the filter bank bandwidths to achieve improved channel period signal to noise ratios (65 Hertz instead of Miller's 75 Hertz), (2) additional filtering criteria applied to each period translation circuit as discussed above, and (3) optimization of the peak detection circuitry discussed below.
Miller does not address the problem of noise induced errors, but his disclosed error correction logic circuitry would block some of the necessary measurements needed for noise correction. An additional benefit of the present invention is that the error correction logic need only address gross-type errors introduced by peak detection discrimination errors. Significant noise removal prior to peak detection will also reduce the rate of occurrence of gross errors as a function of S/N input, since the probability of errors introduced by noise derived harmonic misalignment is reduced.
A significant improvement in gross error production is achieved by employing a new modification to the histogram concept disclosed in the prior art. The modified process is herein designated as a "bi-phase harmonic summation." The bi-phase process utilizes both positive and negative excursions of harmonically related pulse trains. Improved performance over the prior art is realized by algebraic cancellation of amplitude components when an even harmonic is summed algebraically with a harmonic of twice the period (half the frequency). This is shown in FIG. 2 for the equal weighted case, but the half frequency component need not be equal in amplitude for improvements to occur. All negative residues in the sum are discarded in the peak energy detection process. Thus a signal with strong even harmonic content will contribute a half period peak reduced by the sum of all odd harmonic amplitudes. This half period peak reduction allows improved discrimination against even harmonic (T/2) type measurement error. Such errors are a major percentage of the errors obtained in the prior art which employs simple magnitude-sum histogram techniques. The minimization of the T/2 type error source can improve error performance in another way. The peak discrimination ratio represented by .DELTA. A in FIG. 2 can be lowered to reduce 2T type errors occurring when noise causes the second occurrence of the fundamental peak to be larger than the first occurrence.
Another approach to viewing the difference between the implementations is to consider noise effects of two types. These are input noise effects and processing noise effects. The concept as proposed by Schroeder has a degree of input noise suppression capability due to decorrelation of noisy channels which have no harmonic information within their passbands. The approach disclosed adds error correction logic to reduce processing noise effects. The present approach adds additional filtering requirements to further suppress input noise (i.e., within a channel containing harmonic energy) and provides a common measurement reference, a modified histogram technique and improved peak detection to reduce system noise.
The peak detection apparatus of the present invention responds to the peak value of the summed pulse generator outputs. Noise components in the outputs of the generators are summed in root-sum square fashion with the result that the narrow summation pulse observed under high S/N conditions becomes spread out under low S/N, still retaining the same total area. The optimum peak detector under this low S/N condition includes a filter whose impulse response has the same shape and duration as the spread summation pulse and which senses the peak (or zero slope) instant of the "matched" filter output. This optimum circuit is realized with a combined Bessel filter -- differentiating circuit as shown in FIG. 3 (Bessel filter -- zero slope detector). Simpler approaches, such as the peak sense and hold circuit used by Miller, will have too much filter bandwidth, resulting in excessive noise induced in the measured pitch period. This noise can originate from the input or from circuit errors prior to peak detection. The measurement errors are caused by insufficient averaging of the spread summation pulse and phase shifts associated with low pass filters not possessing the constant delay characteristics of Bessel filters.
In summary, the disclosed circuitry has more noise tolerance than the prior art due to the unique combining of the following characteristics of the circuit: (1) reduced bandpass filter bandwidth to improve individual harmonic signal to noise ratios in each pass band channel; (2) additional filtering of period data to minimize input noise induced period measurement errors; (3) harmonic energy measurement (i.e., voiced signal strength) to augment the processing of noisy signals with the apparatus; (4) peak detection of zero slope/Bessel filter for improvement in noise tolerance; (5) the utilization of a common signal measurement/pulse generation reference to minimize processor induced noise effects; (6) the minimization of noise induced errors prior to summation/peak detection; and (7) the inclusion of circuits to utilize the technique of bi-phase histogram to improve performance.
The disclosed approach, utilizing digital processing techniques as well as digital word signal interfaces, employs digital error correction also, but on a noise suppressed processing product that is already in a digital word format. Although a specific error correction technique is not specifically disclosed as this is well-known art, it is assumed obvious that the error statistics prior to error correction will be different for this system compared to the prior art.
A process to determine what pitch dynamics are produced by normal speakers is not claimed (although the disclosed apparatus has this capability) but indicates how such information can be used in this system by those skilled in the art. Since it is a stated objective of the claimed system to process speech signals in a high noise environment, the selection of optimum criteria to limit normal frequency changes will depend on the speaker population of interest, the noise levels (both ambient and peak), and the desired final corrected error statistics. It is believed that a user skilled in the art may wish to select his own alterable criteria, dependent on the above stated considerations. Details of the selection criteria were not deemed valid subject matter for this patent application as sufficient disclosure of means (circuitry) is made for performing the specified function. Actions to be taken by a person skilled in the art with respect to frequency change rate criteria selection are dependent on a broad range of possible applications.
U.S. Pat. No. 3,420,955 to Noll discloses an alternative pitch measuring apparatus which does not use the harmonic summation concept. It has digital control means but relies on analog processing techniques. It is representative of the prior art in that the pitch measurement means (in this case via a spectrum thresholding technique) is not particularly noise immune.