The invention relates to a method for processing and to an apparatus for encoding audio or video frame data.
For broadcasting purposes a 4-stereo-channel real time MPEG audio encoder board has been designed. Such encoders can be realised using DSPs (Digital Signal Processor) wherein the processing as well as input and output interfaces is distributed over several DSPs. The processing is block oriented. For every new block of e.g. 1152 samples the related bitstream part is generated. As processing time for the processing blocks is not constant, and because several processing blocks on each DSP might compete for execution at any time instant, the total processing delay can vary significantly.
A requirement may be that the encoder is able to operate with different encoding parameters because MPEG allows e.g. various sample frequencies and overall data rates.
The total burden of the real time processing is distributed by means of multi-threading to each DSP and to the software equivalent of FIFO buffers carrying one or more frames or blocks of data, i.e. the encoder in principle consists of a chain of software FIFOs alternating with processing blocks (denoted threads) between the encoder input and output. The following problems need to be solved for a real time processing system:
a) The total input to output delay should either be constant or better be definable (within the given memory constraints) for both, external and internal requirements. Enforcing a minimum delay will assure that enough processing time even for the worst case (an input data sequence resulting in maximum processing time) is available without temporary underflow of the output buffer (consider start with low delay time and afterwards increased processing time).
b) A sufficiently simple and reliable way for controlling this delay is required for serviceability.
It is one object of the invention to disclose a method for processing audio or video frame data in which processing the delay between different processing stages as well as the overall delay can be adapted, particularly when a presentation of audio data together with the related video is required.
It is a further object of the invention to disclose an apparatus for encoding audio data which utilises the inventive method.
Normally the audio or video frames are processed in an encoder in subsequent different stages, for example conversion to frequency coefficients in a first stage and bit allocation and quantisation in a further stage. In a path in parallel to the first stage the (psychoacoustic) masking is calculated.
In the invention the overall input to output delay is controlled by applying a time stamp based mechanism related to a system time base. Upon entering into the system, input data, e.g. per frame or block of data, are stamped with the present system time, i.e. an input time stamp ITS. This time stamp is then passed along together with the main data through the processing stages, e.g. in the format of a linked data field. Somewhere during the processing the intended output time for an output frame or block of data is calculated, using in principle the calculation
OTS=ITS+DELAY,
wherein OTS is an output time stamp. OTS is then used by the output stage to control that the output data are output at the intended time.
Additionally, encoding parameters required for a specific processing path can be added to the input streams for the audio channels by linking them with the associated audio data and by storage in the various buffers together with its audio data, i.e. the corresponding encoding parameters are kept linked with the audio data to be encoded throughout the encoding processing in the different data streams and data paths. Preferably the original encoding parameters assigned to the processing paths become converted to a different format in order to minimise the required word length and/or to facilitate easy evaluation in the related processing stages. Thereby each data stream can be processed with the correct parameter set without waiting for finishing encoding of the old data stream and for reset and loading of new parameters before starting encoding of a new data stream with new parameters.
The invention can also be used for audio or video decoding with a corresponding inverse order of processing stages.
In principle, the inventive method is suited for processing audio or video frame data which are processed in succeeding different processing stages wherein input time stamps are generated which become linked at least in one input processing stage with frames of said audio or video data to be encoded, and wherein said input time stamps or time stamps derived from said input time stamps remain linked with the correspondingly processed frame data in said different processing stages in the processing but are at least in the last processing stage replaced by output time stamps denoting the presentation time, and wherein in each of theses stages the corresponding time stamp information linked with current frame data to be processed is regarded in order to control the overall delay of the processing.
In principle the inventive apparatus is suited for encoding audio frame data which are processed in succeeding different processing stages, and includes:
means for generating time stamp information including input time stamps and output time stamps;
means for linking time stamp information with frames of said audio data or processed audio data, respectively;
means for converting time domain samples into frequency domain coefficients, to the input of which means buffer means are assigned;
means for calculating masking properties from said time domain samples, to the input of which means buffer means are assigned;
means for performing bit allocation and quantisation of the coefficients under the control of the output of said masking calculating means, to the input of which bit allocation and quantisation means buffer means are assigned;
means (AES-EBU_A) for controlling presentation of the final output data at or at about a time corresponding to said output time stamps,
xe2x80x83wherein said input time stamps become linked at least for one input processing stage with frames of said audio data to be encoded and said input time stamps or time stamps derived from said input time stamps remain linked with the correspondingly processed frame data in said conversion means, in said masking calculating means and in said bit allocation and quantisation means, but at least in the last processing stage in the encoding processing are replaced by output time stamps, and wherein in each of theses stages the corresponding time stamp information linked with current frame data to be processed is regarded in order to control the overall delay of the encoding processing.