The present invention relates to a method for encoding digital picture video signals in order to reduce the amount of data possibly to be transmitted. More particularly, the invention concerns how to encode more than one picture at the same time, or more precisely, how to encode a number of blocks from different pictures simultaneously.
Several standard methods exist for compressing video signals. In a number of these methods (ITU recommendation H.261 and ISO/IEC 11172) the following techniques are substantially utilized in order to achieve compression of the data to be transmitted:
1. The pictures, which consist of a number of pixels, are divided into a number of blocks consisting of, for example, 8.times.8 or 16.times.16 pixels. The size utilized depends on what is to be performed.
2. Since pictures following each other in direct temporal succession often are relatively similar, a picture block can often be approximately described by using parts of previously encoded pictures. If one presupposes that a block of 8.times.8 pixels is to be encoded, wherein this block is referred to as ORIG(i,j) wherein i,j=1 . . . 8, one will search for a suitable block of 8.times.8 pixels in a previously encoded picture. This block in the previously encoded picture will typically have a shifted position in the picture as compared to ORIG, which fact reflects movements in the picture content. The magnitude of the shift is indicated by means of a so-called vector stating the number of pixels along which the block has been shifted in a horizontal and a vertical direction. The block which has been found to be rather similar to the ORIG block is referred to as a prediction for the ORIG block, and it is defined as PRED(i,j), wherein i,j=1 . . . 8. Algorithms exist for finding a best possible PRED block in a previously encoded picture, however such algorithms are not herein discussed in detail, since detailed formulations of such algorithms do not constitute any part of the present invention.
3. Since a receiver itself is capable of generating PRED (the receiver has stored the necessary previously reconstructed encoded/decoded pictures, and the receiver itself is also able to compute PRED in accordance with the same algorithm as the transmitter), there is no need to transmit the full ORIG. It is sufficient to transmit a so-called difference signal defined as follows: EQU DIFF(i,j)=ORIG(i,j)-PRED(i,j) i, j=1 . . . 8.
4. In order to transmit DIFF in a form as highly compressed as possible, one usually converts the DIFF block (which in the present example is 8.times.8 pixels) into a matrix which contains 8.times.8 transform coefficients which thus represent DIFF. To a large degree, the so-called two-dimensional Cosine transform is utilized which has favorable characteristics. The transform coefficients obtained by transforming DIFF are referred to as TRANS, and are defined as follows: EQU TRANS(i,j) i,j=1 . . . 8.
5. Subsequent to transforming DIFF into TRANS, the transform coefficients are quantized, and the transformed and quantized coefficients, which through quantizing now constitute an approximate representation of DIFF, can then be transmitted to a receiver in accordance with a certain matrix reading strategy (for example see the reading strategy and formatting into transmittable data described in Norwegian patent no. 175,080 belonging to the same applicant as in the present invention). In the receiver, inverse quantizing and inverse transforming are performed in order to reconstruct DIFF. However, in actuality the reconstructed values DIFF' deviate somewhat from DIFF. The reason for the deviation is the quantization. It is to be noted that the two last mentioned operations must be made in an equal manner at both the transmitting and the receiving sides. The obtained DIFF' values are utilized for reconstructing a representation ORIG' of the picture block which was to be encoded: EQU ORIG'(i,j)=PRED(i,j)+DIFF'(i,j) i,j=1 . . . 8.
When picture compression is implemented, it is necessary both to compress in order to represent the pictures by as few bits as possible, and de-compress to be able to display the pictures again. An expression often used is that compression is undertaken at the transmitter side, and de-compression is undertaken at the receiver side. The calculation operations to be performed are therefore also divided in two parts quite naturally such that one set of operations is for the transmitter and one set of operations is for the receiver. The calculation operations at the transmitter side, which roughly comprise finding a prediction, constructing a difference signal, making a transformation and finally preparing a bit stream, is usually referred to as "encoding". Thus, it is referred to encoding, for example, a block or a complete picture. In the same manner the calculating operations at the receiver side are referred to as "decoding", which operation roughly comprises finding the prediction from a received vector, making an inverse transformation, and assembling the picture for display. Encoding and decoding are in reality closely connected since an encoding process followed by a decoding process shall lead to a reconstructed picture. It is important to note that the transmitter must also undertake large parts of the decoding process because the transmitter and the receiver must have exactly the same reconstructed pictures as a basis for the predictions.
From what has been stated above, many of the calculating operations used in compression will be reflected both at the encoding side and the decoding side (this relates, for example, to "run patterns" for coefficients, wherein the same run pattern is used both at the transmitter side and the receiver side). The present invention deals with such operations which must be made both at the transmitter and the receiver sides. Often when talking about "encoding", one will thus actually include both encoding and decoding. This holds valid also regarding parts of this invention as herein described and as defined by the appended claims.
In the ISO/IEC 11172 and ISO/IEC 13818 standards for picture compression, different picture types are utilized. These types are largely characterized by the manner of preparing predictions of blocks. In the cases where a picture is encoded without using any prediction, the designation "I pictures" is used. In the case where a prediction from one previously decoded picture is used, the picture being encoded is designated as a "P picture" (P meaning predicted). A third type of picture is a "B picture" (B meaning two-way or "bidirectional" prediction). In order to encode a B picture, information from two previously decoded pictures, one of which being ahead of and one of which succeeds the picture to be encoded, is used as a prediction.
FIGS. 1 and 2 illustrate how to prepare a prediction of blocks if there is a situation with a mixture of P and B pictures. It is presumed that picture 1 has been transmitted to the receiver. This cannot have been a B picture. The next to happen is that the whole of picture 3 will be encoded. Predictions for the blocks in picture 3 can be found in picture 1, and an example is shown by the block ORIG2 in picture 3 finding its prediction as block PRED1 in picture 1. The movement vector (shift vector) describing the position of PRED1 in relation to ORIG2, is V1. In a corresponding manner, predictions in picture 1 are found for all blocks in picture 3, i.e. a set of movement vectors are listed corresponding to all blocks in picture 3.
Next, all of picture 2 shall be encoded. As apparent from FIG. 2, block ORIG1 in picture 2 can then be predicted as a mean value between blocks PRED2 and PRED3 in the two encoded pictures 1 and 3. PRED2 and PRED3 are found separately by using an ordinary algorithm, and two movement vectors V2 and V3 indicate where the two blocks used for the prediction are situated. V2 and V3 may be separately transmitted vectors in some embodiments, or alternatively they may be down-scaled versions of movement vector V1, for example, see the article by A. Puri et al. in Signal Processing: Image Communication, Vol. 2, August 1990, NL, pages 127-144; "Video Coding with Motion-Compensated Interpolation for CD-ROM Applications". Thus in this latter situation, the prediction itself is generated as a calculated combination of the PRED2 and PRED3 blocks, i.e. in the simplest manner as a mean value, (PRED2+PRED3)/2.
Such an encoding method often turns out to be efficient, because the prediction of picture 2 comes out well, and for this reason the difference signal ("quantized TRANS") may require only few bits for the transmission.
Down-scaling of movement vectors is also known from an article by A. Nagata et al. in pages 109-116 of the same publication as the previous citation, titled: "Moving Picture Coding System for Digital Storage Media using Hybrid Coding." Both of these citations disclose the use of a group of pictures in which the last picture is reconstructed and then used in conjunction with a picture previous to the group of pictures and scaled-down movement vectors to arrive at the data of the other picture(s) in the group.
The greatest disadvantage of the afore-mentioned method is that for prediction of B pictures, one has to read data from two picture memories (PRED2 and PRED3) into a processing unit, and every such data reading is resource-demanding as to implementation.