The present invention relates to a picture coder, picture decoder, and picture transmission system that combine good data compression performance with a high tolerance of frame dropouts.
Recently there has been a proliferation of systems that transmit moving pictures through communication networks: examples include videophone, videoconferencing, and video-on-demand (VOD) systems. To reduce the volume of transmitted data, the pictures are digitized and compressively coded at the transmitting device, and decoded at the receiving device.
Two basic types of moving-picture coding can be distinguished: intra-frame coding, and inter-frame coding. Intra-frame coding codes each frame separately, or divides a frame into blocks and codes each block separately. Methods of intra-frame coding have been standardized by, for example, the Joint Photographic Experts Group (JPEG). Intra-frame coding compresses the data by reducing spatial redundancy in each frame.
Inter-frame coding reduces both spatial and temporal redundancy, by coding only the differences between one frame and a preceding frame. Inter-frame coding may include motion compensation, which yields high data compression ratios.
Methods that employ both intra- and inter-frame coding have been standardized by, for example, the Moving Picture Experts Group (MPEG) and the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T). FIG. 21 illustrates the coding system adopted in ITU-T Recommendation H.261. Intra-frame coding, indicated by hatching, is performed at regular intervals; inter-frame coding is carried out at other times. In inter-frame coding, each frame is coded with reference to the immediately preceding frame, as indicated by the arrows. Frames coded by inter-frame coding are in a sense predicted from the preceding frames, and are referred to as P-frames, while frames coded by intra-frame coding are referred to as I-frames. In FIG. 21, frames a and i are I-frames, while frames, b to h, j, and k are P-frames.
FIG. 22 shows an example of a problem that occurs in the H.261 scheme and similar schemes. If the receiving device is unable to decode frame e for some reason, then frames f, g, and h will also be undecodable. The receiving device will have to wait until it receives the next I-frame (frame i) before decoding can resume. Dropout of a single frame can thus lead to lengthy and highly undesirable gaps in the received moving picture.
Frame dropout can occur for a variety of reasons. Frames may be dropped intentionally at the receiving end because, for example, the decoder has a slower processing speed than the coder and is unable to keep up. In networks that transmit data in packets or cells, packets or cells may be dropped en route when the network becomes overloaded. This can occur in local-area networks employing the well-known Ethernet system, for example, or in wide-area networks employing the well-known asynchronous transmission mode (ATM). Particularly in wide-area networks, packets or cells may also arrive out of sequence, because of having been transmitted over different network routes, for example; this again causes problems in the decoding of P-frames.
To deal with these latter problems, some networks employ a protocol in which the transmitting device sends packets with attached serial numbers, and the receiving device rearranges the packets in the correct order, confirms their arrival, and sends requests for the retransmission of non-arriving packets back to the transmitting device. A well-known example of this type of protocol is the Transmission Control Protocol (TCP).
When network operation is unstable, however, and packets are dropped frequently, retransmission under this type of protocol can cause large cumulative delays to build up, which is unsuitable for the real-time transmission of moving pictures. When moving-picture data are transmitted, it is generally preferable to display new data, even if that means skipping a frame, rather than wait for the retransmission of old data.
These problems are compounded in multi-point transmission schemes such as broadcasting and multicasting schemes, which send the same data to multiple receiving sites. If the transmitting device heeds a retransmission request from one receiving site, it will often be forced to transmit to other sites a packet that those other sites have already received successfully, and the network load will be greatly increased. Broadcasting and multicasting are therefore usually carried out under a protocol that does not perform retransmission, such as the User Datagram Protocol (UDP); but as a result, the probability of frame dropout increases.
In wireless networks, frame dropout is a serious problem even when transmission takes place over a dedicated channel, instead of by packet or cell switching. Wireless transmission is highly prone to error, and when the errors exceed the error-correcting capability of the receiving device, the usual practice is to discard a certain section of the data in order to re-establish valid data processing. Data dropouts therefore tend to be larger than in wireline networks.
These factors limit the usefulness of the coding scheme illustrated in FIG. 21 to the transmission of moving pictures through telephone lines, integrated services digital networks (ISDNs), and other facilities that offer a reliable link, equivalent to a physical circuit, between the transmitting and receiving devices. For transmission through other types of networks, in which frame dropout or skipping is to some extent unavoidable, the scheme illustrated in FIG. 23 is often employed: all frames are coded as I-frames, using JPEG coding, for example.
When all frames are coded as I-frames, if a dropout occurs, as at frame e in FIG. 24, it has little effect on the perceived quality of the moving picture. In FIG. 24 only frame e is lost; the succeeding frames f, g, and h can be decoded successfully, because their decoding does not depend on preceding frames.
The problem with the all-I-frame scheme is that the data compression ratio is not very high, because temporal redundancy is not removed. Much network bandwidth is therefore consumed.
Japanese Patent Kokai Publication No. 95571/1995 discloses an alternative scheme, illustrated in FIG. 25, in which P-frames b to h are all coded with reference to the preceding I-frame (a). Under this scheme, the loss of a P-frame does not affect the decoding of other P-frames. A disadvantage of this scheme is that the data compression ratio tends to decline with each succeeding P-frame, due to increasing temporal distance between the P-frame and the I-frame to which the P-frame is referenced.
One object of the present invention is, accordingly, to enable P-frames, including P-frames coded with reference to a preceding P-frame, to be decoded after a frame dropout, without waiting for the next I-frame.
Another object of the invention is to enable picture quality to adapt to transmission channel conditions.
A further object is to enable data compression ratios to adapt to transmission channel conditions.
Still another object is to provide a human user with control over the quality of transmitted moving pictures, in ways suitable for different transmission channel conditions.
The invented transmission system transmits a series of frames from a transmitting device to a receiving device. Intra-frame coding or inter-frame coding is selected for each frame, and the corresponding coding process is carried out at the transmitting device. When inter-frame coding is selected, the frame is coded with reference to a reference frame, the reference frame being a frame that was coded previously. The coded data resulting from intra-frame coding and inter-frame coding are transmitted to the receiving device. The receiving device decodes the decoded data, and sends acknowledgment signals back to the transmitting device. The transmitting device selects the reference frame on the basis of these acknowledgment signals.
In a first preferred mode of operation, the receiving device transmits positive acknowledgment signals, and the transmitting device selects positively acknowledged frames as reference frames.
In a second preferred mode of operation, the receiving device transmits negative acknowledgment signals, and the transmitting device selects the most recently coded frame as the reference frame, except when a negative acknowledgment signal is received. When a negative acknowledgment signal is received, the reference frame is set back to a frame preceding the negatively acknowledged frame. A negative acknowledgment signal may be accompanied by a desired reference frame number, to enable the transmitting device to select a reference frame that the receiving device has successfully decoded.
In a third preferred mode of operation, both negative and positive acknowledgment signals are sent, the transmission channel quality is assessed according to these acknowledgment signals, and the method of reference frame selection is varied according to the assessment. For example, the reference frame can be selected as in the above first preferred mode under bad channel conditions, and as in the above second preferred mode under good channel conditions.
The channel quality assessment criteria and reference frame selection method can also be varied in response to input from a human user.