The invention relates in general to detection and concealment of errors in a signal transmitted in digital form from a transmitter to a receiver. In particular the invention relates to detection and concealment of transmission errors in an audio signal processed in the form of frames by a digital audio receiver.
Transmission of an audio signal in digital form from a transmitter to a receiver is known as such and it is going be become more common as digital television and broadcasting systems replace older systems based on analog frequency modulation. Known telecommunications standards dealing with the transmission of digital audio signals include the ETS 300 401 standard by the European Broadcasting Union (EBU) and European Telecommunications Standards Institute (ETSI) and the ISO/IEC 11172-3 and ISO/IEC 13818-3 standards by the International Standard Organization (ISO) and International Electrotechnical Commission (IEC). These standards specify a certain frame structure for the transmission of a digital audio signal. The ETS 300 401 standard, which is also called the DAB (Digital Audio Broadcasting) standard, specifies a frame structure which in a way is a special case of the frame structure specified in the ISO/IEC 11172-3 and ISO/IEC 13818-3 standards as it contains additional specifications concerning frame structure particulars left open in the earlier standards. With an audio signal sampling frequency of 48 kHz the DAB standard is based on the ISO/IEC 11172-3 standard and with a sampling frequency of 24 kHz on the ISO/IEC 13818-3 standard. To illustrate the background of the invention, the structure of the audio frame according to the aforementioned standards and its processing in transmitter and receiver apparatuses is described in brief below.
FIG. 1 is a simplified block diagram of an apparatus 1 according to the ISO/IEC 11172-3 and 13818-3 Layer II standards generating DAB frames from a pulse-code-modulated (PCM) audio signal. The apparatus comprises an input port 2, output port 3, and between them, a filter bank 4, quantising and coding block 5, and a frame generating block 6, connected in series. In parallel with the filter bank 4, there is a psychoacoustic model block 7 the input signal of which is the same as the filter bank input signal. The outputs of blocks 4 and 7 are taken to a bit allocation block 8 the output of which controls quantising and coding in block 5. The apparatus also comprises a data port 9 such that digital program associated data brought thereto is taken to the frame generating block 6 which incorporates the program associated data in the frame structure.
FIG. 2 is a simplified block diagram of an apparatus 10 according to the ISO/IEC 11172-3 and 13818-3 Layer II standards decoding the frames generated by the transmitter shown in FIG. 1 into a pulse-code-modulated audio signal. It comprises an input port 11, output port 12, and between them, a frame decoding block 13, reconstructing block 14 and an inverse filter bank 15, connected in series. The frame decoding block 13 is also connected with a data port 16 to take program associated data to other circuits of the receiver apparatus.
The audio signal is transmitted as frames between apparatuses according to FIGS. 1 and 2. The amount of data in a single frame corresponds to a 24- or 48-ms-long audio signal part. In addition to audio data proper the frame contains header information, checksums, information related to the processing of audio data, and program associated data, PAD. Since transmission paths are not ideal, errors may occur in the contents of the frames which affect the operation of the receiver in different ways depending on the location of the error in the frame.
FIG. 3 shows the structure of an audio frame 17 according to the DAB standard. The frame comprises an integer number of eight-bit bytes (not shown). It starts with a 32-bit header 18, followed by a 16-bit CRC word 19. The length of the bit allocation part 20 is 26 to 176 bits depending on the audio mode (single channel, dual channel, stereo, joint stereo) and sampling frequency used as well as on the bit rate used for transmitting the audio program. An SCFSI part contains instructions for the interpretation of the scale factor part 22 following it. The scale factors in the latter provide information about how the various parts of the signal were emphasised at the frame generation stage. Each scale factor is represented by a six-bit codeword (not shown) and the number of codewords in the frame varies according to how much variation there is in the different parts of the audio signal during the period represented by the frame. Part 23 contains the sampled values proper which represent the sampled audio signal. If the bits representing the sampled values do not fill the length of the space reserved for them, the empty part is filled with padding bits 24.
There are in the end of the frame 17, from right to left in the Figure, a fixed program associated data (F-PAD) field 25, scale factor cyclic redundancy check (SCF CRC) error protection 26 for the audio data, and an extended program associated data (X-PAD) field 27. The latter is not necessarily included in every audio frame. In accordance with the ETS 300 401 standard, the program associated data fields 25 and 27 are intended for the transmission of data that are closely related to the audio data proper included in the frame and that may have synchronisation requirements concerning the audio data. Their use is not mandatory. The F-PAD and X-PAD fields together form the program associated data (PAD) part. The F-PAD field particularly includes a two-bit X-PAD indicator (not shown) to indicate whether the frame includes an X-PAD field and if so, whether it is a four-byte, so-called short X-PAD field or a variable size X-PAD field.
FIG. 4 shows in more detail an audio frame header 18 the length of which is 32 bits (four bytes). The description to follow concerns both the ISO/IEC 11172-3 and ISO/IEC 13818-3 standards and the DAB standard so that the specifications required by the DAB standard are mentioned separately. The first twelve bits form a synchronisation word 29 in which all bits are ones. The next bit 30 is a so-called ID bit wherein value xe2x80x9c1xe2x80x9d corresponds to the application of the ISO/IEC 11172-3 standard and value xe2x80x9c0xe2x80x9d corresponds to the application of the ISO/IEC DIS 13818-3 standard in the audio signal processing. The length of the Layer field 31 is two bits and its value corresponds to the layer of the ISO/IEC 11172-3 standard in use. The DAB standard allows values xe2x80x9c10xe2x80x9d (Layer II) and xe2x80x9c00xe2x80x9d (reserved for future expansion). The protection bit 32 indicates whether there is a checksum in the frame, and its value according to the DAB standard is xe2x80x9c0xe2x80x9d, meaning a checksum is used. The next four-bit field 33 represents the bit rate of the audio program in use. The ISO/IEC 11172-3 and ISO/IEC 13818-3 standards do not allow the value xe2x80x9c1111xe2x80x9d in the field 33. Furthermore, the DAB standard does not allow the value xe2x80x9c0000xe2x80x9d. The sampling frequency field 34 includes two bits representing the sampling frequency of the original pulse-code-modulated signal. According to the DAB standard, values xe2x80x9c00xe2x80x9d and xe2x80x9c01xe2x80x9d are not allowed in this field 34. Value xe2x80x9c01xe2x80x9d corresponds to a 48-kHz sampling frequency if the ID bit is xe2x80x9c1xe2x80x9d, and to a 24-kHz sampling frequency if the ID bit is xe2x80x9c0xe2x80x9d. Value xe2x80x9c11xe2x80x9d is reserved for future expansion. A padding indicator bit 35 is xe2x80x9c0xe2x80x9d according to the DAB standard because there are no padding bits in the audio frame formed from a 48-kHz or 24-kHz PCM signal. According to the ISO/IEC 11172-3 and ISO/IEC 13818-3 standards, bit 35 is xe2x80x9c1xe2x80x9d if there are padding bits in the audio frame. The Private bit 36, which is reserved for private use, has no significance according to the DAB, ISO/IEC 11172-3 and ISO/IEC 13818-3 standards.
A two-bit field 37 indicates the audio program""s transmission mode which can be stereo (xe2x80x9c00xe2x80x9d), joint stereo (xe2x80x9c01xe2x80x9d), dual channel (xe2x80x9c10xe2x80x9d) or single channel (xe2x80x9c11xe2x80x9d). The joint stereo mode in accordance with the DAB standard is also known as xe2x80x9cintensity stereoxe2x80x9d. At sampling frequency of 48 kHz, the values of fields 37 and 33 correlate such that only the following combinations are allowed:
At the sampling frequency of 24 kHz, all modes are allowed at all bit rates specified for 24 kHz.
The mode field extension 38, the length of which is two bits as well, is significant according to the DAB standard only if the mode field value is xe2x80x9c01xe2x80x9d, i.e. the joint stereo mode is in use. Then the value of the extension field 38 indicates according. to a certain table which of the 32 subbands of the signal are in the intensity stereo mode. The following copyright bit 39 is xe2x80x9c0xe2x80x9d if the audio program transmitted is not copyright protected, and xe2x80x9c1xe2x80x9d if the program is covered by copyright protection. Value xe2x80x9c1xe2x80x9d of the copy bit 40 indicates that the program transmitted is an original recording and value xe2x80x9c0xe2x80x9d indicates that the program is a copy. The value of the emphasis field 41 corresponds according to the ISO/IEC 11172-3 standard to the emphasis used in the coding of the program. The DAB standard does not allow emphasis, so according to the DAB standard, the value of the field 41 is always xe2x80x9c00xe2x80x9d.
For the processing of samples and generation of frames, the ISO/IEC 11172-3 or ISO/IEC 13818-3 encoder uniformly divides the original pulse-code-modulated signal into 32 subbands (cf. filter bank 4 in FIG. 1). For one frame, the encoder reads 36 samples from each subband and arranges them into three 12-sample groups. For each group the encoder determines a scale factor, or a coefficient for normalising the subbands for transmission. The mutual relationship of the magnitude of the group scale factors determines whether the encoder includes all three scale factors in the frame to be transmitted or whether it utilises the (near) identicalness of the scale factors by including in the frame only one or two scale factors. The number of scale factors per particular subband is represented by a subband specific SCFSI parameter, to which a reference was made above in the description of FIG. 3. For each scale factor there is in the frame scale factor part a six-bit codeword, allowing values xe2x80x9c000000xe2x80x9d through xe2x80x9c111110xe2x80x9d.
The encoder of the transmitting apparatus continually monitors the frequency spectrum of the audio signal encoded and compares it with a so-called psychoacoustic model on the basis of which it divides the limited number of bits coming to each frame among the subbands. This so-called bit allocation procedure reserves the most bits for those parts of the signal that are the most important for the auditory impression. The same procedure determines the number of quantising levels for each subband. The least significant subbands are allocated no bits at all in the frame, so their number of quantising levels is zero. On other subbands, allowed numbers of quantising levels comprise 16 integers. At the sampling frequency of 48 kHz, the smallest number is 0 and the greatest, 65,535, except for the slow bit rate (32 or 48 kbit/s) modes where the maximum number of levels on the two most significant subbands is 32,767 and on the following six subbands, 127. In the slow bit rate modes, the frame includes the samples of only the eight most significant subbands (subbands 0 to 7). In other modes, the frame includes the samples of the 27 most significant subbands (subbands 0 to 26). At the sampling frequency of 24 kHz, the maximum number of quantising levels is for the four first subbands 16,383, on the next seven subbands, 127, and on the following nineteen subbands, 9, and on the two least significant subbands, 0.
To encode the samples, each sample is divided by the scale factor associated with it and a codeword is formed from the result according to a mapping operation defined in the standards. Each codeword comprises at least 3 and at most 16 bits, depending on the number of quantising levels. On subbands to which the bit allocation procedure assigned three, five or nine quantising levels, three successive samples constitute a granule, represented by a common codeword. Its maximum allowed value in the case of three quantising levels is 26, in the case of five quantising levels 124, and in the case of nine quantising levels 728. The mapping operation used in the codeword generation is chosen such that the codeword cannot comprise ones only. This is to prevent the mixing up in the receiving apparatus of codewords and the synchronisation word xe2x80x9c1111 1111 1111xe2x80x9d located in the beginning of the frame.
In the digital transmission of audio signal according to the prior art, detection of errors and the resulting error concealment attempts are based on the use of check-sums. In accordance with the above, the audio frame according to the ISO/IEC 11172-3 and ISO/IEC 13818-3 standards has one checksum field (reference designator 19 in FIG. 3) and the audio frame according to the DAB standard has additionally a second checksum field (reference designator 26 in FIG. 3). The former is a 16-bit CRC checksum covering the third and fourth bytes in the frame header as well as the bit allocation part (reference designator 20 in FIG. 3) and the SCFSI part (reference designator 21 in FIG. 3). The polynomial generating the CRC checksum is G1(X)=X16+X15+X2+1. The receiver uses the same polynomial to calculate the CRC checksum for the bits of the aforementioned coverage area and if it does not equal the checksum in the received frame, a transmission error is detected in the frame.
According to the DAB standard, the second checksum field in the end of the frame covers the most significant bits of the scale factors. At a sampling frequency of 48 kHz, modes in which the channel specific bit rate is at least 56 kbit/s (corresponds to an overall bit rate of at least 56 kbit/s in the single channel mode and at least 112 kbit/s in the other modes) have the scale factors protected by four separate CRC checksums the first of which (ScF-CRC0) covers subbands 0 through 3, the second (ScF-CRC1), subbands from 4 to 7, the third (ScF-CRC2), subbands from 8 to 15, and the fourth of which (ScF-CRC3) covers subbands 16 through 26. In modes where the channel specific bit rate is below 56 kbit/s, the scale factors are protected by two CRC checksums, the first (ScF-CRC0) covering subbands 0 to 3 and the second (ScF-CRC11) covering subbands 4 to 7. At the sampling frequency of 24 kHz, the scale factors are always protected by four separate CRC checksums the first of which (ScF-CRC0) covers subbands 0 through 3, the second (ScF-CRC1), subbands from 4 to 7, the third (ScF-CRC2), subbands from 8 to 15, and the fourth of which (ScF-CRC3) covers subbands 16 through 29. Lest the positions of the first and second checksums be changed according to the bit rate, the checksums are located in field 26 of FIG. 3 in reverse order, i.e. in the case of the higher bit rate of 48 kHz and 24 kHz, checksum ScF-CRC3 is the first, reading from the beginning of the frame, and checksum ScF-CRC0 is the last, reading from the beginning of the frame. In the case of the lower bit rate of 48 kHz, checksum ScF-CRC1 is the first, reading from the beginning of the frame, and checksum ScF-CRC0 comes thereafter. The polynomial generating all the CRC checksums protecting the scale factors is G2(X)=X8+X4+X3+X2+1 and each of them covers the three most significant bits of the scale factors according to the aforementioned grouping. The receiver uses the same polynomial to calculate the CRC checksums for the most significant bits of the scale factors and if any one of them does not equal the checksum in the received frame, a transmission error is detected in the frame.
The aforementioned standards ETS 300 410, ISO/IEC 11172-3 and ISO/IEC 13818-3 do not specify a mandatory model of operation according to which the receiver should respond to transmission errors it detects in received audio frames. However, various operating model alternatives are known from recommendatory parts of the standards and from other telecommunications technology. In digital mobile phone technology, where the voice signal is transmitted in frames, it is usual that a receiver will not reproduce an audio part conveyed by a frame that was detected erroneous but mutes the sound reproduction unit totally for a moment or replaces the rejected frame with noise. Another option is that instead of the erroneous frame the receiver re-plays the preceding error-free frame. Since, however, the audio technology according to this patent application aims at sound reproduction of substantially better quality than that of telephone technology, automatic muting or substitution of a whole frame would degrade the auditory impression too much.
Another disadvantage of the prior art is that checksums are not a 100% reliable method to detect all transmission errors. If several errors occur in one and the same frame, it is possible that their effect on the checksum is equal but in the opposite direction so that the checksum appears correct in spite of the errors in the frame.
An object of this invention is to provide a method and equipment with which detection and concealment of errors are performed in the reception of a digital audio signal more reliably than in the prior-art solutions. Another object of the invention is to provide a method and equipment suitable for digital audio reception with which the concealment of transmission errors distorts only a little the auditory impression of a reproduced sound.
The objects of the invention are achieved by observing in the decoding and error concealment units of the receiver several successive frames and arranging their decoding and the audio signal reconstruction in a suitable manner.
The method according to the invention is characterised that it comprises stages wherein
several successive frames are stored in memory,
one frame stored in memory is chosen as the current frame,
the current frame is examined for errors, and
errors detected in the current frame are concealed using the contents of other stored frames.
The invention is also directed to a decoding apparatus to realise the method according to the invention. The apparatus according to the invention is characterised in that the reconstructing block in it comprises
a table for the temporary storing of frames,
read and write means to write frames to said table and read frames from it in windows,
means for verifying the integrity of a frame included in the window read, and
means for replacing erroneous values in the current frame with values obtained from other frames in the window.
The method according to the invention aims at a balanced solution in which the optimal transmission error detection and concealment level is achieved using reasonable computing capacity. The receiver receives and stores several successive frames which, when stored, form a certain frame table. To read the table, the receiver uses a certain window the magnitude of which is an integer number of frames greater than zero and which covers at least the current frame. In a preferred embodiment, the window also covers at least one frame received prior to the current frame and at least one frame received after the current frame. Decoding of frames in the window area is performed in stages. The latest frame arriving in the window area is first decoded until its scale factors are found out. Then the receiver conceals possible errors found in the scale factors of the current frame. In the concealment, it utilises scale factors of other frames in the window area. Next, the receiver continues decoding the latest frame until its samples are dequantised but not yet scaled. After that, the receiver uses frames in the window area in order to conceal errors that it may have found in the unscaled samples of the current frame. Only then are the samples of the current frame scaled and by means of inverse filtering a PCM signal is generated, which is taken to the output port of the decoder.
Having processed one frame the receiver moves the observation window one frame forward with respect to the frame table, whereafter the frame decoding described above starts over again. The method according to the invention is very suitable for parallel processing as the reception of new frames, their storing in the frame table, detection and concealment of errors in the current frame, the inverse filtering of the corrected frame and writing to the output data flow can be separate, parallely functioning parts.
In the method according to the invention, detection of errors is based both on the use of checksums and on the use of so-called fundamental sets of allowed values.
The latter means that if the receiver detects in a certain part of a received frame a bit combination which is not a combination allowed for that part of the frame, as specified by the standards, it assumes that there is a transmission error in that particular part. For both the scale factors and samples, the receiver tries to replace the values assumed erroneous with correct values found in the nearest possible frame. Only in a situation where correct replacement values cannot be found in the whole observation window area is the total or partial muting of the reproduced signal used as a means to conceal the erroneous part.
Size of the observation window may in one preferred embodiment of the invention be a dynamically variable parameter so that the method is adapted to different conditions causing transmission errors. One way of estimating error conditions on a longer term than one frame is to maintain a continually updated error parameter that represents the bit error ratio (BER) of the received signal. The receiver may also use the error parameter value to make other decisions concerning decoding and error concealment. If the average error level is high, it may be more advantageous to process an uncorrectable error by muting a whole frame, whereas with a low average error level, muting one or a few subbands is a better solution.