Field of the Invention
The invention lies in the communications field. More specifically, the present invention relates to a method for transmitting voice data wherein the voice data are compressed before transmission and decompressed at the transmission destination. The compression is thereby based on a decomposition of the voice data into phonemes. Phonemes are the acoustic language elements which are essential for the perception of spoken language.
It has been known in the art to compress voice data before transmission in a communications network in order to occupy as little transmission bandwidth as possible in the communications network. In these cases, when the voice is reproduced at the transmission destination the compressed voice data are returned to their original state, or to an equivalent state, by decompression. Because the reduction in the transmission band width which can be achieved by such a method depends directly on the compression rate of the compression method used, it is desirable to try to achieve the highest possible compression rate.
During voice transmission, the methods used for the compression are usually prediction methods which utilize the statistical unequal distribution of the data patterns occurring in voice data in order to reduce a high level of redundancy which is inherent in voice data. During the decompression process, the original voice data can be reconstructed from the compressed voice data virtually without falsification with the exception of small losses which are inherent in the process. The compression ratio which can thereby be achieved lies in the order or magnitude of approximately 1:10. Methods of that type are described, for example, by Richard W. Hamming in "Information und Codierung" [Information and Coding]", VCH Verlagsgesellschaft Weinheim, 1987, pages 81-97.
In typical voice data, information relating purely to the content forms only a small fraction of the entire voice information. The greatest part of the voice information comprises, as a rule, speaker-specific information which is expressed, for example, in nuances of the speaker's voice or the register of the speaker. When voice data are transmitted, essentially only the information relating to their content is significant, for example in the case of purely informative messages, automatic announcements or the like. For this reason it is possible, by reducing the speaker-specific information, also to achieve significantly higher compression rates than with methods which completely or virtually completely preserve the information payload of the voice data.
The smallest acoustic units in which language is formulated by the speaker and in which the information relating to the content--the spoken words--is also expressed are phonemes. U.S. Pat. No. 4,424,415 (see European patent EP 71716 B1), German patent DE 3513243 C2, and European patent EP 423800 B1 have heretofore disclosed arrangements and methods in which a stream of voice data is analyzed for the phonemes contained in it and converted into a stream of code symbols which are respectively assigned to the phonemes detected, in order to compress the voice data before transmission.
A significant problem here is the reliable detection of the phonemes from which any stream of voice data which are to be transmitted is composed. This is made difficult in particular as a result of the fact that the same phoneme can be realized very differently depending on the speaker and the speaker's linguistic habits. If phonemes are not detected within the stream of voice data or assigned to incorrect sounds, the transmission quality of the language is impaired--possibly to the point of incomprehensibility. Reliable phoneme analysis is therefore an important criterion for the quality and/or the range of application of such voice transmission methods.