1. Field of the Invention
The present invention relates to a method and apparatus for normalizing a speech feature vector utilizing a backward cumulative histogram, and more particularly, to a method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to a smallest value so as to estimate a noise robust histogram.
2. Description of Related Art
Generally, a speech recognition system utilized in various environments must extract a speech feature vector, which is robust against noise, as an essential element for stable speech recognition.
A histogram normalization based nonlinear conversion algorithm is currently being developed so as to conform to a statistical feature of speech data in which a speech feature vector is clean in a noisy environment.
An example of a conventional histogram normalization method is described in an article entitled “Evaluation of quantile-based histogram normalization with filter combination on the Aurora3 and Aurora4 database” (Hilger et al., RWTH Achen-University of Technology, Eurospeech, 2005). This method does not take a cumulative distribution function (hereinafter, CDF) modeling using the entire histogram, but divides the CDF modeling into four quantiles so as to solve a lack of data. However, this example of the conventional histogram normalization method utilizes a forward histogram estimation method which cumulates a probability distribution function (PDF) in an order from a smallest to a greatest value.
Specifically, as shown in FIG. 1, this conventional forward histogram estimation method divides a variable section of a speech vector into a predetermined number of bins, constitutes a PDF corresponding to each of the divided bins, cumulates the PDF in an order from a smallest to largest value, and thereby generates a CDF, and utilizes the generated CDF as a histogram.
Another example of a conventional histogram normalization method is described in an article entitled “Enhanced histogram normalization in the acoustic feature space” (Molau, et al., RWTH Achen-University of Technology, ICSLP, 2002). This method divides learning data into a speech section and a silent section and thus, obtains each histogram CDF, and also calculates the entire CDF by considering a ratio of the silent section. However, this example of the conventional histogram normalization method also proposes only the forward histogram estimation method which cumulates a PDF in an order from a smallest to largest value.
Yet another example of a conventional histogram normalization method is disclosed in U.S. Patent Publication No. 2003/0204398 entitled “Online parametric histogram normalization for noise robust speech recognition (assigned to the Nokia Corporation). This method obtains the mean and distribution of learning data utilizing 38 frame buffers from a test speech vector, and improves a histogram utilizing the mean and dispersion that is obtained from the learning data. However, the example of the conventional histogram normalization method also discusses only the forward histogram estimation method which cumulates a PDF from a smallest to largest value.
A histogram estimation must be robust against noise so that the conventional histogram normalization method may effectively work.
FIG. 2, parts (a) and (b), are diagrams illustrating a distortion of a speech feature vector by additive noise and a channel.
Referring to FIG. 2, parts (a) and (b), a distortion of a signal section where the size of a speech signal is comparatively large, i.e. a peak, is not so severe in comparison to the distortion of the signal section where the size of the speech signal is comparatively small.
However, in the conventional forward histogram estimation method, when a speech signal is corrupted by noise, the size of a signal section with a comparatively small value, i.e. a valley section, is severely distorted in comparison to the distortion of the signal section with a comparatively great value, i.e. a peak section.
As described above, when cumulating a PDF to obtain a CDF, the conventional forward histogram normalization method cumulates the PDF in an order from a smallest to largest value. Accordingly, an error is also cumulated and thus, the shape of the CDF may be extremely distorted which may cause a histogram matching error. Specifically, since the conventional forward histogram normalization method is significantly affected by noise, the reliability of the histogram estimation may be decreased.
Accordingly, a method of estimating a noise robust histogram in a speech recognition system is required.