A speech acquisition path refers to the whole speech transmission path before the speech is actually digitized.
Typical speech acquisition path includes therefore air from lips to the microphone, microphone, wires, antialiasing filters, analog-to-digital converter. This is determining the transfer function of the system. Noises can be introduced at each of these devices and from power supply of the analog-to-digital converter.
Practical speech acquisition lines, especially for low cost devices, introduce both convolutive and additive noises to the input speech, and cause additional statistical mismatch between an utterance to be recognized and trained speech model set. Such mismatch will cause performance degradation.
Previously, SNR-dependent cepstral normalization (SDCN), fixed-code-word-dependent cepstral normalization (FCDCN) [See A. Acero. Acoustical and Environmental Robustness in Automatic Speech Recognition. Kluwer Academic Publishers, 1993], multi-variate Gaussian based cepstral normalization [P. Moreno, B. Raj, and R. Stem. Multi-variate Gaussian based cepstral normalization. In Proc. of IEEE Internat. Conf. on Acoustics, Speech and Signal Processing, Detroit, 1995] and statistical re-estimation [P. Moreno, B. Raj, and R. Stern. A unified approach to robust speech recognition. In Proceedings of European Conference on Speech Communication and Technology, Madrid, Spain, Sept. 1995] have been proposed to deal with similar problem. They all assume that the distortions can be modeled by a bias in the cepstral domain, which is clearly not the case for additive distortions. Vector Taylor series has been used to approximate the distortion as function of cepstral representation of additive and convolutive noises. See reference P. J. Moreno, B. Raj, and R. M. Stern. A vector taylor series approach for environment-independent speech recognition. In Proc. of IEEE Internat. Conf on Acoustics, Speech and Signal Processing, Atlanta, Ga., 1996.