1. Field of the Invention
The present invention relates to a speech enhancement apparatus and method, and more particularly, to a speech enhancement apparatus and method for enhancing the quality and naturalness of speech by efficiently removing noise included in a speech signal received in a noisy environment and appropriately processing the peak and valley of a speech spectrum where the noise has been removed.
2. Description of the Related Art
In general, although speech recognition apparatuses exhibit high performance in a clean environment, the performance of speech recognition in an actual environment where the speech recognition apparatus is used, such as in a car, in a display space, or in a telephone booth, deteriorates due to surrounding noise. Thus, the deterioration in the performance of speech recognition by noise has worked as an obstacle to the wide spread of speech recognition technology. Accordingly, many studies have been developed to solve the problem. A spectrum subtraction method to remove additive noise included in a speech signal input to a speech recognition apparatus has been widely used to perform speech recognition which is robust with respect to the noisy environment.
The spectrum subtraction method estimates an average spectrum of noise in a speech absence section, that is, in a period of silence, and subtracts the estimated average spectrum of noise from an input speech spectrum by using a frequency characteristic of noise which changes relatively smoothly with respect to speech. When an error exists in the estimated average spectrum |Ne(ω)| of noise, a negative number may occur in a spectrum obtained by subtracting the estimated average spectrum |Ne(ω)| of noise from the speech spectrum |Y(ω)| input to the speech recognition apparatus.
To prevent the occurrence of a negative number in the subtracted spectrum, in a conventional method (hereinafter, referred to as the “HWR”), a portion 110 having an amplitude less than “0” in the subtracted spectrum (|Y(ω)|−|Ne(ω)|) is adjusted to uniformly have “0” or a very small positive value. In this case, although a noise removal performance is superior, a possibility that distortion of speech occurs during the process of adjusting the portion 110 to have “0” or a very small positive value is increased so that the quality of speech or the performance of recognitiondeteriorate.
In another conventional method (hereinafter, referred to as the “FWR”), in the subtracted spectrum (|Y(ω)|−|Ne(ω)|), a portion having an amplitude less than “0”, for example, an amplitude value of P1, is adjusted to be the absolute value, that is, an amplitude value of P2, as shown in FIG. 2. In this case, although the quality of speech can be improved, more noise may be present. In FIGS. 1 and 2, |S(ω)| denotes the original speech signal in which no noise is mixed.