Linear Transforms in Multimedia Signal Processing
A common problem in the design of multimedia signal processing systems, e.g., for image, video and audio coding, is the implementation of an approximation of a linear transformation which may form a processing block or part of a processing block in a multimedia processing system.
For example, the discrete cosine transform (DCT) is used for spatial-domain signal compaction in image and video coding. Similar transformations known as Modified DCT (MDCT) or Lapped Orthogonal Transforms, are often used in audio coding. Other well-known transformations include the Fourier Transformation, the family of transforms known as W transformations, and many others. Linear transforms may also be part of other processes.
Often the transform is designed as an approximation of a set of ideal transformation equations. A linear transformation can be expressed as follows. Given a column vector x of input samples, a linear transformation process can be expressed as the performance of multiplication of the input vector by a transformation matrix T. to form a set of output samples y=T*x following ordinary mathematical principles known as linear algebra.
FIG. 1 illustrates an example of a two-dimensional transformation 300 applied to an n×m input data block 120 to produce an n×m output data block 130. In applications such as video coding, the input data block represents picture elements (pixels) sampled at regular spatial intervals of an image, and is therefore referred to as a representation of the data in the spatial domain. The output block is said to represent the data in the transform domain. An inverse transformation 110 reverses the transformation 100, reconstructing the original data from the output block 130.
In some cases, as in image and video coding, a two-dimensional transformation is employed and the ideal transformation is separable, meaning that it can be decomposed into independent transformations to be applied to the rows and columns of a two-dimensional input array X to result in a two-dimensional matrix of output samples Y. In such a case, the transformation can be expressed as Y=TC*X* TTR, where TC is transformation matrix applied to the columns of X and TR is a transformation matrix applied to the rows of X.
Transformation matrices may be either square or rectangular. A block-wise transformation refers to segmenting input data into a number of separate blocks and applying a transformation to each block. In some cases the transformation processes may be applied to overlapping sets of input data to produce what is known as an overlapped block transform.
The same concepts can be applied to extend the analysis to higher dimensions, for example to perform a transformation to be applied to video data both spatially and temporally, forming a three-dimensional transformation process.
For an arbitrary matrix T containing M columns and N rows with arbitrary elements, the number of multiplications necessary to implement the transformation process would be M*N, and the number of addition or subtraction operations would be (M−1)*N. In some cases the direct application of a matrix multiplication operation for the computation of the transformation may require an excessive amount of computation processing resources.
Decomposition of Transforms
A common approach to reducing the computation processing cost of a transformation is to decompose the transformation into a cascade of simpler transformation processes. Here, the term “simpler” is used to refer to something that is less “complex.” The term “complex” as used herein refers to the quantity of computational resources required to perform a specific task. This may include various such issues as:                Storage requirements for data and computer instructions        Precision requirements for arithmetic operations        The number and type of arithmetic operations (e.g., considering additions and subtractions as less complex than multiplications, which may in turn be considered less complex than divisions, etc.)        Quantity, latency, and speed requirements for memory accesses        Impact on cache data flow in cache-oriented architectures for instructions and data processing        
A significant amount of research has been performed toward finding low-complexity decompositions of well-known transformation processes. In some cases, the reason a particular idealized transformation is used in the system is the fact that it is known that lower-complexity decompositions exist to help compute it. For example, it is common in image and video compression applications to use a discrete cosine transform (DCT) rather than a Karhunen-Loeve transform (KLT) because it is known that the DCT can be decomposed easily while the KLT, in general, cannot.
Two-Dimensional IDCT and DCT Definitions
The ideal real-valued block-wise inverse DCT (also known as an IDCT or a type-III DCT, unitary or orthonormal formulation) for an M×N block of inverse-quantized transform coefficients {circumflex over (F)}m,n[v,u] at position [nN][mM] in a picture can be defined as follows:
                    f        ^            ⁡              [                              n            ⁢                                                  ⁢            N                    +          y                ]              ⁡          [                        m          ⁢                                          ⁢          M                +        x            ]        =            ∑              u        =        0                    M        -        1              ⁢                  ∑                  v          =          0                          N          -          1                    ⁢                        (                                    c              u                        ⁢                                          2                M                                              )                ⁢                  (                                    c              v                        ⁢                                          2                N                                              )                ⁢                                                                              F                  ^                                                  m                  ,                  n                                            ⁡                              [                v                ]                                      ⁡                          [              u              ]                                ·                      cos            ⁢                                                  [                                                            (                                                            2                      ⁢                                                                                          ⁢                      x                                        +                    1                                    )                                ⁢                u                ⁢                                                                  ⁢                π                                            2                ⁢                                                                  ⁢                M                                      ]                    ·                      cos            ⁡                          [                                                                    (                                                                  2                        ⁢                                                                                                  ⁢                        y                                            +                      1                                        )                                    ⁢                  v                  ⁢                                                                          ⁢                  π                                                  2                  ⁢                                                                          ⁢                  N                                            ]                                          for x=0 . . . M and y=0 . . . N.
In typical applications, such as the ITU video coding standards known as H.261, H.262, and H.263 and the ISO/IEC video coding standards known as MPEG-1 video, MPEG-2 video, and MPEG-4 visual, the input to the IDCT has integer values and the decoded output samples are required to have integer values. Therefore, rather than considering the ideal IDCT result in a decoder to be the closest approximation of the above equation, the ideal result of the decoding process is considered to be the result obtained by rounding the output of the above equation to the nearest integer value (with rounding away from zero for values that are exactly half-integers).
The ideal real-valued forward DCT (also known as an FDCT or a type-II DCT, unitary or orthonormal formulation) for an M×N block of spatial-domain samples f[nN+y][mM+x] at position [nN][mM] in a picture can be defined as follows:
                    F                  m          ,          n                    ⁡              [        v        ]              ⁡          [      u      ]        =            ∑              x        =        0                    M        -        1              ⁢                  ∑                  y          =          0                          N          -          1                    ⁢                        (                                    c              u                        ⁢                                          2                M                                              )                ⁢                  (                                    c              v                        ⁢                                          2                N                                              )                ⁢                                            f              ⁡                              [                                                      n                    ⁢                                                                                  ⁢                    N                                    +                  y                                ]                                      ⁡                          [                                                m                  ⁢                                                                          ⁢                  M                                +                x                            ]                                ·                      cos            ⁡                          [                                                                    (                                                                  2                        ⁢                                                                                                  ⁢                        x                                            +                      1                                        )                                    ⁢                  u                  ⁢                                                                          ⁢                  π                                                  2                  ⁢                                                                          ⁢                  M                                            ]                                ·                      cos            ⁡                          [                                                                    (                                                                  2                        ⁢                                                                                                  ⁢                        y                                            +                      1                                        )                                    ⁢                  v                  ⁢                                                                          ⁢                  π                                                  2                  ⁢                                                                          ⁢                  N                                            ]                                          for u=0 . . . M and v=0 . . . N.
In applications such as the above-listed ITU and ISO/IEC video coding standards, the input to the forward DCT would be an integer and the representation of the transform coefficient values at the output of the inverse quantization process in the decoder uses integer values.
The constants used in these equations are defined as follows:cu=1/√{square root over (2)} for u=0, otherwise 1.cv=1/√{square root over (2)} for v=0, otherwise 1.
In the ITU and ISO/IEC standards relevant to this discussion, both M and N are typically equal to 8.
The LLM Decomposition for IDCT and DCT
The signal flow diagram in FIG. 2 shows a decomposition of a one-dimensional 8-input inverse DCT in a decomposition manner known as the Loeffler, Ligtenberg, and Moschytz (LLM) decomposition. (See, C. Loeffler, A. Ligtenberg, and G. S. Moschytz, “Practical fast 1-D DCT algorithms with 11 multiplications”, Proc. IEEE Intl. Conf on Acoust., Speech, and Signal Proc. (ICASSP), vol. 2, pp. 988-991, February 1989.) A very similar form of decomposition (not shown) can also be applied to produce a forward DCT computation. There are actually a couple of variations of the LLM decomposition, where this shows one particular variant in which the signal flow from left to right never involves more than a single multiplication operator. The illustrated decomposition uses 14 multiplication operators, in contrast to straightforward application of the inverse DCT process which would require 64 multiplications.
Note also that if the overall magnitude of the data flowing in the diagram can be scaled by a constant factor that is under the control of the designer, a scale factor of 1/sqrt(8) can be incorporated into that constant factor and the number of remaining multiplication operations can be reduced from 14 to 12.
When performing a two-dimensional IDCT or DCT operation, the scale factor of 1/sqrt(8) for each stage can be incorporated into an overall scale factor of 1/8. This is a very simple factor in the sense that it is an integer power of two. Thus, it can be represented in fixed-point arithmetic using an operator known as an arithmetic shift operator.
A number of other well-known decompositions exist for DCT and IDCT computation. These include methods known as Chen-Wang, Wang, AAN (Arai, Agui, and Nakajima), etc.
Fixed Point Approximation Techniques
One way to ease the computational burden of performing a transformation is the use of fixed-point arithmetic. This consists essentially of two techniques: rounding and scaling. Sometimes clipping is also employed.
The ideal values of the multipliers that are found in a transformation matrix are often not integers. In fact, they are often not even rational numbers, so exact computations are not feasible. For example, general-purpose computer arithmetic would typically not have a way to exactly represent a number such as 1/sqrt(8) or cos(π/4). A typical technique is to scale the data by some constant value and round the result to some value equivalent to an integer. Alternatively, a fixed-length representation with a “decimal point” or “binary point” understood to be in a certain position can be used. The use of such data structures that are equivalent to using integers to process the data is referred to as the use of fixed-point arithmetic, a term well-known in the art.
For example, to represent the number 1/sqrt(8) we may scale up the number by a multiplication factor of 215, and round the value of 215/sqrt(8) to the nearest integer representation, which would be 11585. Then to multiply an input number x by 1/sqrt(8), we can instead multiply it by 11585. The result would then need to be scaled back down by dividing it by 215 when interpreting the answer, resulting in the approximation value 0.353546142578125 for the actual value which is the irrational number 0.3535533905932737622 . . . . Such a computation method would not produce an exact result, but it would produce a result that is close to the correct value.
Typically, when rounding a multiplication factor to an integer, the nearest integer to the ideal value would be used.
A typical technique for fixed point approximation of a transformation would then consist of the following:                1. Decomposing the transformation into a smaller set of transformations.        2. Approximating each multiplication factor by a fixed-point representation such as an integer by scaling up the ideal value of the multiplication factor and rounding it to the nearest integer (or fixed-point approximation).        
Scaling and rounding operations may also be applied to the input data and to the computed results to adjust the number of bits of precision needed at various points in the processing.
To avoid values that go out of the range of data that can be properly represented using the implemented computer arithmetic, an operation called clipping may be performed that consists of saturating a computed value to fall within some upper and/or lower bound.
In a typical application of a direct implementation of a non-decomposed matrix multiply transformation, the value chosen for the approximation of a multiplication factor by an integer should be the ideal value rounded to the nearest integer after scaling, as it is easily shown that this will produce the most accurate computed result for most purposes.
Application of Fixed Point Approximation to LLM ICDT
An example of this typical approach being applied is found in a publicly-available “IJG” (Independent JPEG Group) software implementation of the LLM IDCT decomposition, shown in FIG. 3.
In this diagram (and in the IJG software), the overall scale factor of 1/sqrt(8) is assumed to be applied in a manner external to the flow diagram as shown and an additional overall scale factor of 1/213 is also assumed to be applied (conceptually or in actual fact) externally to compensate for the magnification of the multiplication factors used to allow their representation in integer values. The 213 scale factor that magnifies the multiplication factors is evident in the use of the 13-bit left shift operations that are applied along pathways shown in the top left area of the flow diagram.