The present invention relates to an image transmission method and apparatus, and more particularly, to a motion video transmission method and apparatus for transmitting a compressed motion video over a network.
In a remote image monitoring system or an image delivery system, a need has been rapidly expanded for a motion video transmission apparatus for transmitting a motion video through an IP (Internet Protocol) network, as represented by a public line and the Interenet. For example, a conventional MPEG-4 based delivery of stream data (comprised of compressed data) of images involves encoding image data to be transmitted in accordance with MPEG-4 in an image transmission unit, and once storing the encoded image data in a storage of the image transmission unit as stream data. The image data may represent a still image, motion video, computer graphics (CG), animation, and the like, and may also include voice, audio, composite music, and the like. Such image data is delivered from the storage in response to a request from the network.
For delivering such image data, particularly, motion video, they must be digitized before transmission. However, since the digitization of image data results in an immense amount of information, motion video compression techniques are required for reducing the amount of the information to be transmitted. For this purpose, a global standard of compression such as MPEG-2 or MPEG-4, which has been conventionally well known, is used for compressing the motion video.
Now, description will be made on the MPEG-based image compression technique. Image data compressed in accordance with MPEG-2 or MPEG-4, i.e., stream data is comprised of an intra picture (hereinafter called the “I-picture”), a predictive picture (hereinafter called the “P-picture”), and a bidirectionally predictive picture (hereinafter called the “B-picture”). The stream data is compressed in three different encoding modes on a picture-by-picture basis. The I-picture refers to an encoded version of image data for a full frame of analog video within the frame. Therefore, upon receipt of an I-picture, an image receiver can reproduce a single I-picture alone. The P-picture refers to encoded data of only a difference resulting from a unidirectional interframe prediction from the preceding image data (I-picture or P-picture). Therefore, the image receiver can reproduce no image only with a received P-picture, and requires an I-picture, which bases the P-picture, for reproducing an image. Further, if an intermediate P-picture is missing, a resulting image will be collapsed, for example, including block distortion and the like. The B-picture refers to an encoded version of difference data resulting from a bidirectional interframe prediction from the preceding image data and the next image data. The B-picture is similar to the P-picture, in that the image receiver cannot reproduce an original image only with the B-picture. Since the P-picture and B-picture contribute to a reduction in the amount of compressed data because of a reduced redundancy in the time base direction with the preceding and subsequent pictures, but the image receiver cannot reproduce an original image only with the P- and B-pictures. A typical combination of MPEG-2 pictures is shown below by way of example:
(I) (B) (B) (P) (B) (B) (P) (B) (B) (P) (B) (B) (P) (B) (B) (I) (B) (B) (P) . . . .
Typically, an I-picture appears every 15 pictures, and this sequence is repeated, as can be seen above.
Next, description will be made on a system for delivering a compressed motion video as mentioned above over a network. FIG. 8 illustrates a networked motion video delivery system described in JP-A-2003-309847 which has been previously proposed by the present inventors.
Referring to FIG. 8, a monitored image captured by a camera 120 is encoded by an image transmitter 111 such as an encoder, and delivered through a network 122 to respective image receivers 112-1, 112-2, 112-3 such as decoders which decode the encoded monitored image that is then displayed on image monitors 124-1, 124-2, 124-3, respectively.
The image transmitter 111 responsible for the compression of a motion video comprises a compression unit, disposed therein, which compresses the motion video at a predetermined bit rate (compression rate). The resulting compressed image data (stream) is transmitted to the image receivers 112-1, 112-2, 112-3, each of which decompresses the stream to restore the original image data which is outputted to an associated monitor. It should be noted that in FIG. 8, the output stream from the image transmitter 111 is directly transmitted to networks 122-1, 122-2, 122-3. This transmission scheme is called “unicast.”
In operation of the foregoing system, the image receiver 112-1, for example, requests the image transmitter 111 for stream data through the network 122-1. The image transmitter 111 delivers the requested stream data to the requesting image receiver 112-1.
The image receiver 112-1 receives the stream data, decompresses the compressed stream data, displays the original data on the monitor 124-1, and records the original data in a recording unit (not shown) as required. Next, the image receiver 112-1 subsequently requests the image transmitter 111 for next stream data through the network 122-1.
The image transmitter 111 transmits the requested next stream data to the image receiver 112-1. The image receiver 112-1 receives the next stream data, decompresses the compressed stream data in a manner similar to the foregoing, displays the original data on the monitor 124-1, and records the original data in the recording unit as required.
The subsequent process is similar to the foregoing, and other image receivers 112-1, 112-3 also request for transmission of stream data, and receive and decompress received stream data in sequence.
Next, the image transmitter 111 will be described in greater detail with reference to FIG. 9. In FIG. 9, a video signal from the camera 120 is applied to the image transmitter 111 through an input terminal 130. The image transmitter 111 comprises a coding processing unit 131 and a protocol control unit 132. While this example is described in connection with the coding processing unit 131 conforming to MPEG-4, the coding processing unit 131 is not limited to MPEG-4, but may be designed to comply with another coding scheme such as MPEG-2. The protocol control unit 132 comprises an I-VOP (Video Object Plane) period buffer 133; an RTP (real time transport protocol) packet processing units 134-1, 134-2, 134-3; and a TCP (transmission control protocol)_UDP (user datagram protocol) processing unit 135. The output of the TCP_UDP processing unit 135 is sent to each network 122 from an output terminal 136. While three RTP packet processing units are shown in FIG. 9, this is because the illustrated example includes three transmission path 122 at different transmission rates. Therefore, the number of the RTP packet processing units is not limited to three.
The protocol control unit 132 thus configured implements transmission rate adaptive packet transmission. The following description will be centered on this configuration. The I-VOP period buffer 133 is a type of buffer which has a sufficient capacity to store encoded data at least from one I-VOP (corresponding to an I-picture previously describe) to immediately before the next I-VOP.
The RTP packet processing unit 134 generates a packet suitable for transmission over a network, such as MPEG-4 encoded data. Specifically, the RTP packet processing unit 134 divides encoded data into one to several packets for each VOP for delivery to the next TCP_UDP processing unit 135 in accordance with the basic specifications of RTP.
The TCP_UDP processing unit 134 transmits the RTP packets to the network 122 in accordance with the connection-type TCP protocol or the connectionless-type UDP protocol. This selection can be remotely set by the user through a personal computer or the like.
The protocol control unit 132 is implemented by software mainly processed by a processor. The RTP packet processing unit 134 performs processing associated with three types of transmission paths 122 connected to the image receiver 112 for simultaneous delivery on a unicast basis.
The MPEG-4 based coding processing unit 131 receives a video signal, and generates MPEG-4 encoded data which is written into the I-VOP period buffer 133. The RTP packet processing unit 134 reads encoded data from the I-VOP period buffer 133 in response to a ready signal (indicated by a dotted line in FIG. 9) in accordance with a transmission rate from the TCP_UDP processing unit 135. Specifically, each of the RTP packet processing units 134-1, 134-2, 134-3 reads an amount of data in accordance with the transmission path rate (transmission rate) to the image receiver 112-1, 112-2, 112-3 associated therewith. In this event, image data is necessarily discarded in the I-VOP period buffer 133 for lower rate transmission paths. In this way, the image data is automatically transmitted at the transmission rate of an associated transmission path.
The TCP-UDP processing unit 135 generates the ready signal in accordance with the transmission rate in a different way according to a selected protocol. When the TCP protocol is selected, the ready signal in accordance with the transmission rate can be automatically generated by a response to packets transmitted from the coding processing unit 131 because the TCP protocol is a connection type.
With the UDP protocol, on the other hand, the ready signal cannot be automatically generated because the UDP protocol is a connectionless type. Therefore, the TCP_UDP processing unit 135 collects packet discard ratio information periodically transmitted from the image receiver 112. The TCP_UDP processing unit 135 controls the packet transmission rate based on the periodic information such that the packet discard ratio is reduced to zero, and generates the ready signal in accordance with the transmission rate control. In this way, the TCP_UDP processing unit 135 can generate the ready signal in accordance with a particular transmission rate.
Here, the packet discard ratio can be calculated from an expected number of received RTP packets and the number of actually received packets. The expected number of received packets refers to the number of packets delivered from a transmitter, including delayed packets and duplicated packets. The period extends from the reception of the preceding RTCP packet to the reception of the current RTCP packet. The number of packets is calculated from a maximum sequence number and a minimum sequence number of received packets. The sequence number represents the order of a packet included in an RTP header (see RFC1889 for details).(Expected Number of Received Packets)=(Maximum Sequence Number)−(Minimum Sequence Number)+1
The packet discard ratio is calculated in the following manner.(Number of Discarded Packets)=(Expected Number of Received Packets)−(Number of Actually Received Packets)(Packet Discard Ratio)=((Number of Discarded Packets)/(Expected Number of Received Packets))×255
While the interval at which the discard ratio is transmitted is described in detail in the calculation of A.7RRCP transmission interval in RFC1889, the discard ratio is information included in the header of an RR packet, and is transmitted at intervals of approximately five seconds.
Next, detailed description will be made on how image data is discarded when it is transmitted from a higher bit-rate transmission path to a lower bit-rate transmission path as mentioned above.
FIG. 3 illustrates an exemplary scenario in which image data should be discarded, which will be described below. Specifically, in FIG. 3, image data is transmitted from the image transmitter 111 to a image receiver 112. The image data used herein refers to encoded image data. When image data is transmitted from the image transmitter 111 to the image receiver 112, the image data should be sent through a higher bit-rate transmission path (including a network, which is also applied to the following) 125 and a lower-rate transmission path 126. In this event, part of image data should be discarded because the bit rate is lower on the lower bit-rate transmission path 126 than on the higher bit-rate transmission path 125. Describing in greater detail, the aforementioned image data transmission method manages encoded image data in units of pictures as mentioned above, so that bursty encoded data indicated by P1, P2 is transmitted on the higher bit-rate transmission path 125 (for example, at transmission rate of 1 Mpbs) as illustrated in FIG. 4. On the lower bit-rate transmission path 126 (for example, at transmission rate of 384 Kbps), on the other hand, the encoded image data is transmitted at a lower bit rate, as indicated by P3, so that part of the encoded image data is discarded. It should be noted that, other than a high bit-rate transmission path as mentioned above, data could be discarded, for example, when it is transmitted from a network, the transmission rate of which is 320 Kbps, to a general public line, the transmission rate of which is 38.8 Kbps.
Since encoded image data is retransmitted when the TCP protocol is used, no problem will arise even if part of encoded image data is discarded. However, since there is no guarantee to send encoded image data to a recipient when the UDP protocol is used, bursty encoded image data can be frequently discarded with the UDP protocol. Consequently, the encoded image data cannot be correctly transmitted as the case may be. To address this problem, it is necessary to control the generation of the ready signal used to request picture data so that averaged encoded image data is transmitted.
JP-A-2002-77260 (pages 4-5 and FIGS. 1, 2) shows a system and method for image transmission which avoids discarding encoded image data, wherein a transmission timing is delayed in order to transmit packet data at intervals of predetermined time or more.