The present invention relates to the processing of data by switching between different sub-band domains, in particular, but not in any way exclusively, for the transcoding between two types of compression coding/decoding.
Recent developments in digital coding formats for multimedia signals today allow significant compression rates. Furthermore, the increase in the capacities of transport and access networks is now ensuring everyday use by the public at large of digital multimedia contents (speech, audio, image, video and the like). The consumption of this content is done on various types of terminals (computers, mobile terminals, personal assistants (PDA), television decoder terminals (“Set-Top-Box”) or the like) and via various types of network (IP, ADSL, DVB, UMTS, or the like). This access to multimedia content by the user must be done in a transparent manner on these various terminals and across these various networks. One then speaks of “universal access to multimedia content” or “UMA” standing for “Universal Multimedia Access”, a diagrammatic illustration of which is represented in FIG. 1.
One of the main problems due to the heterogeneity of terminals relates to the diversity of the coding formats that they are capable of interpreting. One possible solution would be to recover the capacities of the terminal before delivering the content in a compatible format. This solution may turn out to be more or less effective according to the scenario of delivery of the multimedia content considered (downloading, streaming or broadcasting). It becomes inapplicable in certain cases, such as for broadcasting or for streaming in multicast mode. The concept of transcoding (or of changing coding format) therefore turns out to be important. This operation may intervene at various levels of the transmission chain. It may intervene at the server level for changing the format of the content previously stored for example in a database, or else intervene in a gateway in the network, or the like.
A direct and customary method of transcoding consists in decoding the content and in recoding it to obtain a representation in the new coding format. This method generally has the drawbacks of using significant computational power, of increasing the algorithmic delay due to the processing and sometimes of adding a supplementary degradation of the perceptual quality of the multimedia signal. These parameters are very significant in multimedia applications. Their improvement (reduction in complexity and in delay and maintaining of quality) is a significant factor for the success of these applications. This factor sometimes becomes an essential condition of implementation.
With the aim of improving these parameters, the principle of so-called “intelligent” transcoding has come into being. This type of transcoding consists in performing a partial decoding, the most minimal possible, of the initial coding format to extract the parameters allowing the reconstruction of the new coding format. The success of this method is therefore measured by its capacity to reduce algorithmic complexity and algorithmic delay and of maintaining, or even increasing, perceptual quality.
In image and video coding, much work on transcoding has been carried out. We cite for example the changing of image size from CIF to QCIF, or else MPEG-2 to MPEG-4 formats. For the transcoding of speech signals, typically in telephony, work is under way for solving problems related to the coding formats. On the other hand, very little or almost no work has tackled the processing of audio signals. The existing work remains restricted to cases of reducing bit rate within one and the same format or when switching between certain coding formats of very similar structures. The main reason lies in the fact that the most widely used audio coders are of transform (or sub-band) type and, generally, these coders use transforms or filter banks that differ. Thus, it will be understood that the embodying of a system for converting between the representations of the signal in the domains of these transforms or filter banks is therefore the first difficulty to be overcome before being able to attack any other problem related to intelligent transcoding in the field of audio.
Given below is a definition of audio transcoding and the principal problems which arise, after a brief reminder of the principles of perceptual audio sub-band coding.
There exists a great diversity of audio coders which have been designed for various types of applications and for a wide range of bit rates and qualities. These coders may be specific to the constructor (or “proprietary”), or else standardized by decisions of international bodies. Additionally, they all possess a common basic structure and rely on like principles.
The basic principle of perceptual frequency audio coding consists in reducing the bit rate of information by utilizing the properties of the hearing system of the human being. The nonrelevant components of the audio signal are eliminated. This operation uses the phenomenon of so-called “masking”. As the description of this masking effect is done principally in the frequency domain, the representation of the signal is carried out in the frequency domain.
More specifically, the basic schemes of a coding and decoding system are presented in FIGS. 2a and 2b. With reference to FIG. 2a, the digital audio input signal Se is firstly decomposed by a bank of analysis filters 20. The resulting spectral components are thereafter quantized and then coded by the module 22. The quantization uses the result of a perceptual model 24 so that the noise which stems from the processing is inaudible. Finally, a multiplexing of the various coded parameters is performed by the module 26 and an audio frame Sc is thus constructed.
With reference to FIG. 2b, the decoding is carried out in a dual manner. After demultiplexing of the audio frame by the module 21, the various parameters are decoded and the spectral components of the signal are dequantized by the module 23.
Finally, the temporal audio signal is reconstituted by the bank of synthesis filters 25.
The first stage of any perceptual audio coding system therefore consists of a bank of analysis filters 20, used for the time/frequency transformation. A wide, variety of filter banks and transforms have been developed and utilized in audio coders. Mention may be made by way of example of pseudo-QMF filter banks, hybrid filter banks, MDCT transform banks. The MDCT transform is currently turning out to be the most effective in this context. It is the basis of the most recent and efficacious audio coding algorithms such as those used for the MPEG-4 AAC, TwinVQ and BSAC, Dolby AC-3 standard, in the TDAC coder/decoder (standing for “Time Domain Aliasing Cancellation”) from France Telecom, in UIT-T standard G.722.1.
Although these various transformations have been developed separately, they may be described by a similar general mathematical approach and from various points of view: modulated cosine filter banks, lapped orthogonal transforms (or “LOT”) and more generally for filter banks with maximal decimation, that is to say with critical sampling. It is recalled that the property of critical sampling for a filter bank consists in the subsampling/oversampling factor being equal to the number of sub-bands.
FIGS. 3a and 3b respectively illustrate the conventional transcoding and intelligent transcoding schemes in a communication chain, between a coder CO1 according to a first coding format and a decoder DEC2 according to a second coding format. In the case of conventional transcoding, a complete decoding operation is performed by the decoder module DEC1 according to the first format (FIG. 3a), followed by a recoding by the coder module CO2 according to the second format, so as ultimately to end up in the second coding format.
In the case of FIG. 3b, the two blocks DEC1 and CO2 of FIG. 3a are on the other hand replaced with an integrated module 31 that is called an “intelligent” transcoding module.
Represented in FIG. 4 are the details of the operations that are merged by the implementation of intelligent transcoding. This principally involves integrating the functional blocks of the synthesis filter banks BS1 and of the analysis filter banks BA2 of the conventional transcoding so as to culminate in a system for direct conversion between sub-band domains, in the module 31.
The use by the coders of various types of filter banks (of different sizes, in particular in terms of number of sub-bands, and of different structures) is the first and principal problem to be overcome. This therefore involves transposing the whole set of samples of a frame from the domain of an initial filter bank to that of a destination filter bank. This transposition is the first operation to be done in any intelligent audio transcoding system.
Table 1 below gives a summary regarding the types of filter banks used in the best known transform-based audio coders, as well as their characteristics. As may be noted, in addition to the MDCT transform which is the one most widely used, there are the pseudo-QMF banks. Additionally, they all form part of the family of maximal decimation and modulated cosine banks that exactly or almost satisfy the property of perfect reconstruction.
TABLE 1The filter banks most widely used in audiocoding and their characteristics.CoderFilter bankCharacteristicsMPEG-1 LayerPseudo-QMFNumber of bands M = 32I & IIMPEG-1 LayerPseudo-QMF/MDCT32 bands followed by anIII(hybrid)MDCT of size 18 for eachMPEG-2/4 AACMDCTM = 1024 bands for the longwindow and M = 128 for theshort window.KBD (Kaise-Bessel Derived)window with α = 4 for thestationary states and α = 6for the transitions.MPEG-4 BSACMDCTM = 1024 bands for the long,window and M = 128 bandsfor the short window.MPEG-4MDCTM = 1024 bands for the longTwinVQwindow and M = 128 bandsfor the short window.Possibility of using a KBDwindow or a sinusoidalwindow.Dolby AC-3MDCTM = 256 bands for the longwindow and M = 128 for theshort windowKBD window with α = 5FTR&D TDACMDCTM = 320 bandsSinusoidal windowG.722QMFTwo sub-bandsG.722.1MDCTM = 320 sub-bandsSinusoidal window
It is indicated that the switch between the AAC and AC-3 formats is currently arousing much interest.
Table 2 below restates certain types of sub-band coding of table 1 while detailing a few of their applications.
TABLE 2Examples of sub-band coders for audio signals and speech signalsand a few examples of their principal applications.CoderApplicationsRemarksMPEG-1/2BroadcastingLayer IMPEG- 1/2BroadcastingUsed in Europe for DABLayer IIbroadcasting (“DigitalAudio Broadcasting”, ETSIETS 300 401 standard). Usedalso for RF digitaltelevision broadcasting, inEurope (DVB standard)MPEG-1Downloading,Layer IIIstreaming(MP3)MPEG-2/4Broadcasting,The MPEG-2 AAC audio coderAACdownloading,(ISO/IEC13818-7) isstreamingspecified as the only audiocoder for broadcasting inJapan in ISDB services(“Integrated ServiceDigital Broadcasting”)including:ISDB-T (terrestrial),ISDB-S (satellite),and ISDB-C (cable).DVB-IP uses, the MPEG-2 AACcoderMPEG-4BroadcastingThis coder is used in KoreaBSACfor digital televisionbroadcastingDolby AC-3BroadcastingUsed in the USA for digitaltelevision broadcastingSonyUsed in Japan (on-lineATTRAC3music channel of iTunestype).FranceTeleconferencingTelecom:TDACUIT-T G.722TeleconferencingUIT-TTeleconferencing,Group communication systemsG.722.1H.323(teleconferencing,audioconferencing)
In the known prior art in audio transcoding, document U.S. Pat. No. 6,134,523 presents a process for reducing bit rate in the coded domain for audio signals coded by MPEG-1 Layer I or II. Even though this process is akin to audio transcoding processes, it does not carry out any change between coding formats and the signals of the sub-bands remain in the representation of one and the same transformed domain, namely the representation of the pseudo-QMF filter bank. Here, the signals are quite simply requantized according to a new allocation of bits.
Additionally, in document US-2003/0149559, is proposed a process for reducing the complexity of the psycho-acoustic model during a transcoding operation. Thus, so as not to have to resort to an operation for calculating masking thresholds during transcoding, the new system uses values stored in a database of distortion templates. Even though this process deals with the problems of transcoding, it remains far from the objectives relating to switching between filter bank domains.
In document US-2003/014241 is proposed a system for transcoding between the MPEG-1 Layer II and MPEG-1 Layer III audio coding formats. Specifically, the MPEG-1 Layer II format uses a pseudo-QMF analysis filter bank and the MPEG-1 Layer III format uses the same filter bank followed by an MDCT transform of size 18 applied to the output sub-band signals of said bank. One speaks of a “hybrid filter bank”. The conversion system proposed in this document consists in applying this transform after inverse quantization of the samples of the sub-bands of an MPEG-1 Layer II frame. The system therefore profits from the similarity between the two coding formats.
With respect to the sought-after aim within the sense of the present invention, the following remarks may be noted:                This prior art technique can be applied only for this particular case of transcoding.        This technique does not truly process a conversion in a new, different, sub-band domain. It simply involves cascading a new missing analysis filter bank, which makes it possible to increase the frequency resolution.        
Multirate processing and filtering in the transformed domain are already known in another context of image and/or video data processing, especially through the reference:
“2-D Transform-Domain Resolution Translation”, J.-B. Lee and A. Eleftheriadis, IEEE Trans. on Circuit and Systems for Video Technology, Vol. 10, No. 5, August 2000.
It describes a generalization of a process of linear filtering in the transformed domain (TDF standing for “Transform-Domain Filtering”). More particularly, this generalization is established in the case where the first transform (inverse) T1, and the second transform (direct) T2, are of the same size. The generalization consists firstly in extending the process to the case where the transforms are not of the same size. This process is then termed “non-uniform TDF” (or NTDF). It is thereafter extended to the case where in addition to the filtering, multirate processing operations (subsampling and oversampling) are added in the transformed domain, this resulting in the “Multirate-TDF” (MTDF).
Proposed as an application is the changing of resolution in the transform domain (TDRT standing for “Transform-Domain Resolution Translation”), particularly for image and video applications (conversion between CIF and QCIF image formats) where the transform is a DCT (standing for “Discrete Cosine Transform”). This reference is therefore interested only in filtering in the transformed domain. The process set forth is restricted only to cases of transforms with no overlap such as a DCT, a DST, or the like, but could not typically be applied to transforms with overlap such as an MLT (standing for “Modulated Lapped Transform”) and, more generally, to any type of filter bank with maximum decimation, these filters possibly having moreover a finite or infinite impulse response.
As regards the conversion between DCT domains of different sizes, still for the transcoding of images and video, the following reference may be cited: “Direct Transform to Transform Computation”, A. N. Skodras, IEEE Signal Processing Letters, Vol. 6, No. 8, August 1999, pages: 202-204.
In this document is proposed a process for switching between DCT transforms of different sizes for an image subsampling in the DCT domain. One of the applications of this process would be transcoding. Additionally, this process is restricted to the construction of a transformed vector of size N from two adjacent transformed vectors each of size N/2.
A process for converting between the representations of the signal in the MDCT domain and the DFT domain (Discrete Fourier Transform) is presented in document US-2003/0093282.
It was developed with the objective of converting the audio signal into a representation where it can be easily modifiable. Specifically, the TDAC filter banks are more practical and are used more in audio coders, contrary to the DFT filter banks. Additionally, carrying out a processing or modifications on the components of the signal in this transformed domain is neither adequate nor sufficiently flexible in view of the existence of spectral aliasing components. On the other hand, the DFT representation is more useful when modifications are to be made on the audio signal such as a change of timescale or a shift of pitch. This reference therefore proposes a direct process for converting between MDCT and DFT domain instead of applying the conventional process consisting in synthesizing the temporal signal by an inverse MDCT, then applying the DFT. This process therefore allows modifications to be made directly in the coded domain. The document also proposes the dual process for converting between the DFT and MDCT domains, which would be useful in the case where there was a need to recode the audio signal after modification.
In this reference, the comparison in terms of complexity with a conventional conversion process shows no reduction. Additionally, a small gain in memory allowing storage of the data is demonstrated.
However,                The process set forth in this reference deals with a particular case. It is restricted solely to the case of converting between the MDCT and DFT domains and vice versa.        The process is restricted to the case where these two filter banks are of the same size.        
The publication: “An Efficient VLSI/FPGA Architecture for Combining an Analysis Filter Bank following a Synthesis Filter Bank”, Ravindra Sande, Anantharaman Balasubramanian, IEEE International Symposium on Circuits and Systems, Vancouver, British Columbia, Canada, May 23-26, 2004, can also be cited.
This publication discloses an efficient structure for implementing a system consisting of a synthesis filter bank, with L sub-bands, followed by an analysis filter bank with M sub-bands, where M and L are multiples of one another. This structure is efficient for implementation in VLSI integrated technology (“Very Large Scale Integration”) or on FPGA (“Field Programmable Gate Array”) or on parallel processors. It requires fewer logic blocks, low power consumption and makes it possible to extend the degree of parallelism. The process proposed is applicable in situations where one processing based on sub-bands follows another sub-band processing and where the intermediate synthesized signal is unnecessary.
However:                The process described above makes the restrictive assumption that the filter banks considered are modulated and may be decomposed into a polyphase structure.        The process is restricted solely to the particular cases where M and L are multiples of one another.        
It should also be indicated that the structure of the scheme for converting between sub-band domains exhibits a certain similarity with that of the problem of trans-multiplexing presented in particular in:
“Multirate Systems and Filter Banks”, P. P. Vaidyanathan, Prentice Hall, Englewood Cliffs, N.J., 1993, pp. 148-151.
Specifically, in trans-multiplexing from TDM to FDM (standing for “Time Domain Multiplexing” to “Frequency Domain Multiplexing”), a synthesis filter bank is used. To reconstruct the interlaced time signals (that is to say to perform the inverse trans-multiplexing operation from FDM to TDM), an analysis filter bank is used. The structure of the TDM→FDM→TDM system therefore amounts to a cascading of a synthesis filter bank and of an analysis filter bank, this corresponding well to what is also used in a conventional transcoding system. The problem generally posed in these trans-multiplexing systems is to reconstruct the original signals without distortions after the TDM→FDM→TDM operation. This principally involves eliminating the distortions due to the phenomenon of crosstalk which result from the use of non-perfect bandpass filters, in these filter banks. A judicious design of the synthesis and analysis filters, as indicated in the same reference:
“Multirate Systems and Filter Banks”, P. P. Vaidyanathan, Prentice Hall, Englewood Cliffs, N.J., 1993, pages 259-266,
makes it possible to overcome this problem. In the design proposal for these filters, a process is given for merging the synthesis and analysis filter banks, thereby amounting to proposing an intelligent conversion system.
However:                In the multiplexing structure proposed in this document, the synthesis and analysis filter banks have the same number of bands (M=L).        There is no aim to construct a trans-multiplexing system merging the synthesis and analysis filter banks exactly as in transcoding. These two filter banks are left cascaded, independently.        