It is well known that common digital signal processing (DSP) operations like Fast Fourier Transforms (FFT), convolutions, Discrete Cosine Transforms (DCT), etc., often require large dynamical ranges for the variables employed in such algorithms. This leads to implementations using floating point arithmetic rather than with fixed point arithmetic, because the latter yield larger rounding noise at equal word-length. Let us first recall the distinction between integer or fixed-point arithmetic on the one hand and floating point arithmetic on the other.
Our goal is to represent numbers in the memory or registers of a computer or digital circuit in the form of binary digits (‘0’s and ‘1’s). Because of their discrete nature we can only represent a finite set of numbers, all other numbers are “rounded” or “truncated” to one of the representable values, leading to quantization noise. For the sake of the argument let us focus on numbers between −1.0 and 1.0, and say that we have 16-bits available to represent numbers in this range.    for fixed point numbers, all representable numbers are of the form: n·2−15, where n is an integer in the range [−215. . . 215−1]. The representable numbers are uniformly spaced. The dynamic range, which is the ratio of the largest to the smallest representable number is 215≈105.    for floating point numbers, all representable numbers are of the form s·(0.5+m/2a+1)×2e−b, where m is an a-bit integer, and (0.5+m/2a) is called the “mantissa” and obviously lies between 0.5 and 1. s is a 1-bit “sign”, and e is called the “exponent”, a (15-a)-bit number, and b is the exponent-bias. As an example take a 7-bit mantissa and an 8-bit exponent. Then the range 0.5 . . . 1 (set e=b) contains 128 representable numbers 1/256 apart. The range 0.25 . . . 1 (set e=b−1) also contains 128 representable number 1/512 apart, etc. We see that the representable numbers are packed closer together, the closer we get to 0, in a logarithmic fashion. The exponent bias b sets the origin of this quasi-logarithmic scale. In this example the dynamic range is about 2256≈1077.
Floating point numbers are a trade-off between a large dynamic range, and locally uniform distribution of representable numbers. This meshes nicely with the idea that in many relevant computations we need to represent small numbers with a small granularity and large numbers with a large granularity. Another way to say this: the floating point representation matches the “natural distribution of numbers”, which is roughly logarithmic, much more closely. For that reason, in practice floating point calculations almost invariably lead to much more accurate results than fixed point calculations with words of the same size (number of bits).
The major drawback of floating point numbers is that they require more complex hardware to perform additions, multiplication etc. E.g. for a floating point addition, both operands have to be normalized to the same exponent, followed by an ordinary addition, and a final re-scaling of the exponent. In software floating-point operations are therefore usually much slower.
In the case of watermark detection, DSP operations like FFTs must happen accurately: a watermark is carefully hidden in the content (often in the least significant bits) and so the signal processor must proceed with care so as not to lose it. However for watermarking in copy-protection or tracing applications, cost is a major issue: it is not a feature which can warrant a higher price in the store. A manufacturer of watermark detectors has two choices to control the accuracy:    use a floating point implementation with relatively high hardware cost (or high CPU-load for software)    use a fixed point implementation, but considering the statements about accuracy above, one is forced to use much longer words for an integer implementation than for a floating points. This also drives up the cost if many memory words are needed for storage, and consequently a lot of memory bandwidth is needed.
Applicant's International Patent Application WO 99/45707 discloses a watermark embedding system (hereinafter referred to as JAWS) to which the invention is particularly applicable. A watermark, which is typically a pseudo random noise sequence or pattern, is added to a motion video signal in the spatial signal domain. For complexity reasons, the same watermark is embedded in every image (field or frame) of the video signal. To reduce the complexity even more, a small watermark pattern is tiled over the image. A typical tile size is 128×128 pixels.
As in many watermark schemes, the watermark detection method is based on correlating the suspect signal with the pseudo random noise sequence. If the correlation exceeds a given threshold, the watermark is said to be present. In the JAWS watermark detector, the tiles of a number of images are folded into a 128×128 buffer. Detection is then performed by correlating the buffer contents with the small watermark pattern.
FIG. 1 shows the signal processing steps of the watermark detection process. The contents of the 128×128 fold buffer 11 is applied to a two-dimensional FFT 12. In a Symmetrical Phase Only Matched Filtering (SPOMF) step 13, the phase of the frequency coefficients is extracted and subsequently correlated (14) with a frequency domain representation of the watermark 15 to be detected. An inverse FFT operation 16 on the results of this correlation process yields a 2-dimensional array of correlation values. If a significant peak is found in this array (17a), the contents is considered to be watermarked. Optionally and advantageously, the contents is also watermarked with a spatially shifted version of the same watermark. In that case, a further peak is searched for (17b). Their relative positions represent an embedded payload, which is decoded by a payload decoder 18.
Since for algorithms like JAWS, memory is the largest cost-factor, a floating point implementation with 17-bit words (8-bit mantissa, 8-bit exponent) was initially developed. For instance, in the 2D FFT-step 12, if one wanted to use integers, one would need about 20 . . . 24 bits (depending on the video-content) to get similar accuracy to the 17-bit floating-point implementation.
From the literature there are many methods known which can help to reduce the word-length for integer FFTs e.g.:    insertion of guard-bits (shifting the input to the right by k bits, if the signal processing step is expected to increase the dynamic range by k-bits). An FFT on N=2n points increases the dynamic range by N (worst case), so the required insertion of n guard bits is like pre-dividing the input by a factor of N.    using block floating points. Block floating points are really (e.g. 16-bit) fixed points which represent a different range of numbers (like [−1 . . . 1], or [− 1/4 . . . 1/4 ]or [−8 . . . 8]) at different stages in the processing step, depending on the required dynamic range. For instance in an FFT one would choose a new range for the 16-bit variables to represent, after every one of the n stages.
Although these methods are helpful, in general they still cause too much quantization noise to allow e.g. robust watermark detection.