Apart from analog sound, image and video transmissions, digital transmission is gaining more and more weight. Among other things, the reason for this is also that digital signal information is processed, i.e. copied or also compressed, in a simpler manner. Thus, the compression of digital signal information, in particular, leads to being able to transmit information with high information density by means of signal transmission channels having limited data transmission rates.
Apart from the compression of signal information as a type of processing, the embedding of “invisible”—steganographic—information into signal information has been successful in recent times. Such embedding of additional information makes it possible, for example, to identify copyrights—if the signal information is, for example, a piece of music—or generally speaking, providing general information of origin, that is to say a “digital watermark”.
Even though such embedding of steganographic information in music and/or video signals has already been largely successful, embedding steganographic information into coded signal information is still associated with problems, particularly when it has to be transmitted in “real time”. This is based on the fact that certain codings do not provide any redundancy and thus no room for steganographic information or that the steganographic information is lost during the decoding of the encoded signal information.
Such an initial situation in which signal information, mainly voice information in the given case, is transmitted and received via a channel and is encoded and decoded in real time and in which transmission resources are not available to an unlimited extent is found, for example, in mobile radio telephony. In this case, the GSM network allows a maximum transmission rate of 13.0 kbit/s, at best. Due to the very low transmission rate, uncoded voice information, i.e. uncompressed voice information in the present case, would scarcely be understandable any more on the receiver side. In order nevertheless to transmit comprehensible voice information, for example from one mobile radio device to another one, the so called voice codecs have become prominent as a tested means for compressed voice signal transmission. If additional information, i.e. steganographic information is to be embedded in such signal information, the special features resulting from the encoding must be taken into consideration.
In the field of mobile telecommunication, for example in GSM (Global System for Mobile Communications) mobile radio networks or UMTS (Universal Mobile Telecommunications Standard) mobile radio networks, the voice information to be transmitted is encoded by means of the familiar CELP (Code-book Excited Linear Predictive Coding) or ACELP (Algebraic Code Excited Linear Prediction) or in future the AMR (Adaptive Multi-Rate) coding. These voice coding methods are all based on a model of voice generation in which the formation of the voice signal is generated in an excitation stage and a filtering stage in a first approximation. A signal encoder such as, for example, a CELP encoder, an ACELP encoder or an AMR encoder, generates a code book entry, as a rule a vector from a so called code book, wherein the code elements of the code book entry—that is to say, as a rule, the vector components—contain information with respect to the (filter) excitation. Filter Coefficients, gain factors etc. are encoded as time information by means of dedicated code books.
As a rule, a code book for excitation coding consists of a set of vectors, for example having in each case 10 components in the case of ACELP coding according to the Enhanced Full Rate (EFR) Standard, which encode the voice information to be conveyed for a particular length, for example 5 milliseconds. From the dedicated code book which comprises a large multiplicity of vectors overall, the vectors being built up in accordance with familiar criteria, a subset of the code book, a sub-code book is used as a rule which is often sufficient for being able to transmit the normal voice information with good quality.
To distinguish it from the complete code book specified as part of the coding, the sub-code book used in practice is called a “practical code book”.
To rapidly find a suitable code book entry, the practical code book is searched only heuristically, i.e. there is no complete search for a suitable code book entry.
A method which takes into consideration the splitting up of a fixed code book is disclosed in the article “Watermarking Combined with CELP Speech Coding for Authentication” by Zhe-Ming Lu et al. (in IEICE TRANS. INF. & SYST., Vol. E88-D, No. 2 Feb. 2005). In this method, a code book is first split up into three sub-code books from which, in turn, two code books are generated which have different characteristics. Depending on which steganographic information is to be conveyed, one code book entry is now selected from the sub-code book intended for this purpose and used for encoding the voice information to be conveyed. This voice information can be decoded on the receiver side where the actual decoder can also recognize at the same time from which splitting-up of the code book the code book entry originates. To provide a sufficiently good encoding from one of the sub-code books, the familiar analysis by synthesis method is also described in the application. In this method, the selected code word is evaluated, i.e. the quality of the encoding is checked. This is essentially done in that, after voice information has been encoded, the encoding is decoded, i.e. synthesized, and the result of the decoding which, in turn, represents voice information, is compared with the original voice information. Thus, a synthesis is carried out in advance at the transmitter side—encoder side—which, after a possible transmission, is also carried out on the receiver side—decoder side. Such an analysis by syntheses loop makes it possible to find a code word, i.e. as a rule a vector from a code book, which, on the one hand, has the desired characteristic, i.e. originating from the correspondingly split-up sub-code book, and at the same time encodes the voice information with adequate quality.
However, it is found that the fixed dividing of a practical code book—which, of course, is already a subset of a higher-level code book, in any case—into several sub-code books reduces the number of useable code words per sub-code book in such a manner that an audible reduction in voice quality is not impossible.