In the present millennium, several reversible watermarking schemes for audio have been proposed, though on inspection the reversibility is often in the sense of Numerical Analysis, and the reconstruction of an original PCM (Pulse Code Modulation) signal is not lossless, i.e. bit-for-bit accurate, in the presence of the inevitable quantisations within the algorithm. Two algorithms that we consider truly lossless are “Reversible Watermarking of Digital Signals” by M. Van Der Veen, A. Bruekers, A. Van Leest and S. Cavin, published as WO2004066272 and “Lossless Buried Data” by P. Craven and M. Law, published as WO2013061062.
WO2004066272 discloses methods for the reversible watermarking of digital signals by manipulating the histogram of the audio. According to one method, a sigmoid gain function C is applied to an original 16-bit PCM audio signal which is then requantised to 15 bits, leaving a 1 bit hole in the least significant bit position (lsb). Into this lsb hole is inserted data comprising the desired watermark data, overhead and reconstruction data to allow the corresponding decoder to reverse the watermarking process and recover an exact replica of the original audio.
The sigmoid gain function has a gain exceeding 1 near 0 and maps the range of audio signals to itself. Consequently, it must have a gain less than 1 near full scale. Over any range of signal values where the gain of C is less than 2, reconstruction data is required because C maps the 16-bit values that lie within the range on to fewer distinct 15 bit values. Where the gain of C is also greater than 1 there is less than one bit per sample of reconstruction data required and where it is less than 1 there is more than one bit of reconstruction data required. The scheme works because the PDF (Probability Density Function) of signal values audio is not flat, small signal values (where the sigmoid shape of C has gain greater than 1) being more common than large values (where C has gain less than 1). Thus, on average, there is less than 1 bit per sample of reconstruction data (usually much less) leaving sufficient space within the lsb hole for overhead and watermark.
Whilst this method is effective at embedding large amounts of watermark data, there are a number of respects in which the transparency is less than may be desired. The watermark data is additive into the signal so patterns in it may be audible, and the signal modification is just as loud in the frequency regions where the ear is most sensitive as where it is less sensitive. The method also does not offer the flexibility to provide reduced noise in exchange for reduced watermark capacity.
WO2013061062 discloses how the sigmoid gain function may be implemented as the combination of a linear gain and a clipping unit which generates reconstruction data when signal peaks are clipped. It also discloses how separate lossless filtering can be advantageously be used in conjunction with the scheme to modify the signal's PDF in order to reduce the quantity of reconstruction data generated by the clipping unit. Nevertheless it is difficult to see how the audiophile ideal of a low and constant noise floor, uncorrelated with the audio signal and preferably spectrally shaped, may be achieved using the methods of either WO2004066272 or WO2013061062.
A transparent lossy watermarking scheme is described by M. Gerzon and P. Craven in “A High Rate Buried Data Channel for Audio CD”, preprint 3551 presented at the 94th AES Berlin Convention 1993 (hereinafter Gerzon). Watermark data comprising n binary bits per sample is randomised and then used as subtractive dither to a noise-shaped (16−n) bit quantiser. This has the practical effect of discarding the n lsbs of the audio and replacing them by the randomised watermark but with far less harm to the audio than plain replacement of bits. Joint quantisation of two stereo channels is described which allows n to be an odd multiple of ½, as well as more complicated quantisation schemes.
The streaming of audio material is now very popular, and raises the technical requirement that a decoder must be able to commence decoding without seeing the beginning of an encoded item or “track”. In the context of lossless reconstruction an economically-encoded stream, this requirement may present significant technical hurdles, as will be evident.