1. Field of the Invention
This invention relates to a method for the transformation of signals from a frequency to a time representation, as well as a digital circuit arrangement for implementing the transformation.
2. Description of the Related Art
It is a common goal in the area of telecommunications to increase both information content and transmission speed. Each communications medium, however, imposes a limitation on transmission speed, as does the hardware at the transmitting and receiving end that must process the transmitted signals. A telegraph wire is, for example, typically a much faster medium for transmitting information than the mail is, even though it might be faster to type and read a mailed document than to tap it out on a telegraph key.
The method of encoding transmitted information also limits the speed at which information can be conveyed. A long-winded telegraph message will, for example, take longer to convey than a succinct message with the same information content. The greatest transmission and reception speed can therefore be obtained by compressing the data to be transmitted as much as possible, and then, using a high-speed transmission medium, to processing the data at both ends as fast as possible, which often means the reduction or elimination of "bottlenecks" in the system.
One application in which it is essential to provide high-speed transmission of large amounts of data is in the field of digital television. Whereas conventional television systems use analog radio and electrical signals to control the luminance and color of picture elements ("pixels") in lines displayed on a television screen, a digital television transmission system generates a digital representation of an image by converting analog signals into binary "numbers" corresponding to luminance and color values for the pixels. Modern digital encoding schemes and hardware structures typically enable much higher information transmission rates than do conventional analog transmission systems. As such, digital televisions are able to achieve much higher resolution and much more life-like images than their conventional analog counterparts. It is anticipated that digital television systems, including so-called High-Definition TV (HDTV) systems, will replace conventional analog television technology within the next decade in much of the industrialized world. The conversion from analog to digital imaging, for both transmission and storage, will thus be similar to the change-over from analog audio records to the now ubiquitous compact discs (CD's).
In order to increase the general usefulness of digital image technology, standardized schemes for encoding and decoding digital images have been adopted. One such standardized scheme is known as the JPEG standard and is used for still pictures. For moving pictures, there are at present two standards--MPEG and H.261--both of which carry out JPEG-like procedures on each of the sequential frames of the moving picture. To gain advantage over using JPEG repeatedly, MPEG and H.261 operate on the differences between subsequent frames, taking advantage of the well-known fact that the difference, that is the movement, between frames is small; it thus typically takes less time or space to transmit or store the information corresponding to the changes rather than to transmit or store equivalent still-picture information as if each frame in the sequence were completely unlike the frames closest to it in the sequence.
For convenience, all the current standards operate by breaking an image or picture into tiles or blocks, each block consisting of a piece of the picture eight pixels wide by eight pixels high. Each pixel is then represented by three (or more) digital numbers known as "components" of that pixel. There are many different ways of breaking a colored pixel into components, for example, using standard notation, YUV, YC.sub.r C.sub.b, RGB, etc. All the conventional JPEG-like methods operate on each component separately.
It is well known that the eye is insensitive to high-frequency components (or edges) in a picture. Information concerning the highest frequencies can usually be omitted altogether without the human viewer noticing any significant reduction in image quality. In order to achieve this ability to reduce the information content in a picture by eliminating high-frequency information without the eye detecting any loss of information, the 8-by-8 pixel block containing spatial information (for example the actual values for luminance) must be transformed in some manner to obtain frequency information. The JPEG, MPEG and H.261 standards all use the known Discrete Cosine Transform to operate on the 8-by-8 spatial matrix to obtain an 8-by-8 frequency matrix.
As described above, the input data represents a square area of the picture. In transforming the input data into the frequency representation, the transform that is applied must be two-dimensional, but such two-dimensional transforms are difficult to compute efficiently. The known, two-dimensional Discrete Cosine Transform (DCT) and the associated Inverse DCT (IDCT), however, have the property of being "separable". This means that rather than having to operate on all 64 pixels in the eight-by-eight pixel block at one time, the block can first be transformed row-by-row into intermediate values, which are then transformed column-by-column into the final transformed frequency values.
A one-dimensional DCT of Order N is mathematically equivalent to multiplying two N-by-N matrices. In order to perform the necessary matrix multiplication for an eight-by-eight pixel block, 512 multiplications and 448 additions are required, so that 1,024 multiplications and 896 additions are needed to perform the full 2-dimensional DCT on the 8-by-8 pixel block. These arithmetic operations, and especially multiplication, are complex and slow and therefore limit the achievable transmission rate; they also require considerable space on the silicon chip used to implement the DCT.
The DCT procedure can be rearranged to reduce the computation required. There are at present two main methods used for reducing the computation required for the DCT, both of which use "binary decimation." The term "binary decimation" means that an N-by-N transform can be computed by using two N/2-by-N/2 transformations, plus some computational overhead whilst arranging this. Whereas the eight-by-eight transform requires 512 multiplications and 448 additions, a four-by-four transform requires only 64 multiplications and 48 additions. Binary decimation thus saves 384 multiplications and 352 additions and the overhead incurred in performing the decimation is typically insignificant compared to this reduction in computation.
At present, the two main methods for binary decimation were developed Byeong Gi Lee ("A New Algorithm to Compute the DCT", IEEE Transactions on Acoustics Speech and Signal Processing, Vol. Assp 32, No. 6, p. 1243, December 1984), and Wen-Hsiung Chen ("A Fast Computational Algorithm for the DCT", Wen-Hsiung Chen, C Harrison Smith, S. C. Pralick, IEEE Transactions on Communications, Vol. Com 25, No. 9, p. 1004, September 1977.) Lee's method makes use of the symmetry inherent in the definition of the inverse DCT and by using simple cosine identities it defines a method for recursive binary decimation. The Lee approach is only suitable for the IDCT. The Chen method uses a recursive matrix identity that reduces the matrices into diagonals only. This method provides easy binary decimation of the DCT using known identities for diagonal matrices.
A serious disadvantage of the Lee and Chen methods is that they are unbalanced in respect of when multiplications and additions must be performed. Essentially, both of these methods require that many additions be followed by many multiplications, or vice versa. When implementing the Lee or Chen methods in hardware, it is thus not possible to have parallel operation of adders and multipliers. This reduces their speed and efficiency, since the best utilization of hardware is when all adders and multipliers are used all the time.
An additional disadvantage of such known methods and devices for performing DCT and IDCT operations is that it is usually difficult to handle the so-called normalization coefficient, and known architectures require adding an extra multiplication at a time when all the multipliers are being used.
Certain known methods for applying the forward and inverse DCT to video data are very simple and highly efficient for a software designer who need not be concerned with the layout of the semiconductor devices that must perform the calculations. Such methods, however, often are far too slow or are far too much complex in semiconductor architecture and hardware interconnections to perform satisfactorily at the transmission rate desired for digital video.
Yet another shortcoming of existing methods and hardware structures for performing DCT and IDCT operations on video data is that they require floating-point internal representation of numerical values. To illustrate this disadvantage, assume that one has a calculator that is only able to deal with three-digit numbers, including digits to the right of the decimal point (if any). Assume further that the calculator is to add the numbers 12.3 and 4.56. (Notice that the decimal point is not fixed relative to the position of the digits in these two numbers. In other words, the decimal point is allowed to "float".) Since the calculator is not able to store the four digits required to fully represent the answer 16.86, the calculator must reduce the answer to three digits either-by truncating the answer by dropping the right-most "6", yielding an answer of 16.8, or it must have the necessary hardware to round the answer up to the closest three-digit approximation 16.9.
As this very simple example illustrates, if floating-point arithmetic is assumed or required, one must either accept a loss of precision or include highly complicated and space-wasting circuitry to minimize rounding error. Even with efficient rounding circuitry, however, the accumulation and propagation of rounding or truncation errors may lead to unacceptable distortion in the video signals. This problem is even greater when the methods for processing the video signals require several multiplications, since floating-point rounding and truncation errors are typically even greater for multiplication than for addition.
A much more efficient DCT/IDCT method and hardware structure would ensure that the numbers used in the method could be represented with a fixed decimal point, but in such a way that the full dynamic range of each number could be used. In such a system, truncation and rounding errors would either be eliminated or at least greatly reduced.
In the example above, if the hardware could handle four digits, no number greater than 99.99 were ever needed, and every number had the decimal point between the second and third places, then the presence of the decimal point would not affect calculations at all, and the arithmetic could be carried out just as if every number were an integer: the answer 12.30+0456=1686 would be just as clear as 12.30+4.56=16.86, since one would always know that the "1686" should have a decimal point between the middle "6" and "8". Alternatively, if numbers (constant or otherwise) are selectively scaled or adjusted so that they all fall within the same range, each number in the range could also be accurately and unambiguously represented as a set of integers.
One way of reducing the number of multipliers needed is simply to have a single multiplier that is able to accept input data from different sources. In other words, certain architectures use a single multiplier to perform the multiplications required in different steps of the DCT or IDCT calculations. Although such "crossbar switching" may reduce the number of multipliers required, it means that large, complicated multiplexer structures must be included instead to select the inputs to the multiplier, to isolate others from the multiplier, and to switch the appropriate signals from the selected sources to the inputs of the multiplier. Additional large-scale multiplexers are then also required to switch the large number of outputs from the shared multipliers to the appropriate subsequent circuitry. Crossbar switching or multiplexing is therefore complex, is generally slow (because of the extra storage needed), and costs significant area in a final semiconductor implementation.
Yet another drawback of existing architectures, including the "crossbar switching", is that they require general purpose multipliers. In other words, existing systems require multipliers for which both inputs are variable. As is well known, implementations of digital multipliers typically include rows of adders and shifters such that, if the current bit of a multiplier word is a "one", the value of the multiplicand is added into the partial result, but not if the current bit is a "zero". Since a general purpose multiplier must be able to deal with the case in which every bit is a "1", a row of adders must be provided for every bit of the multiplier word.
By way of an example, assume that data words are 8 bits wide and that one wishes to multiply single inputs by 5. An 8-bit representation of the number 5 is 00000101. In other words, digital multiplication by 5 requires only that the input value be shifted to the left two places (corresponding to multiplication by 4) and then added to its un-shifted value. The other six positions of the coefficient have bit values of "0", so they would not require any shifting or addition steps.
A fixed-coefficient multiplier, that is, in this case, a multiplier capable of multiplying only by five, would require only a single shifter and a single adder in order to perform the multiplication (disregarding circuitry needed to handle carry bits). A general purpose multiplier, in contrast, would require shifters and adders for each of the eight positions, even though six of them would never need to be used. As the example illustrates, fixed coefficients can simplify the multipliers since they allow the designer to eliminate rows of adders that correspond to zeros in the coefficient, thus saving silicon area.