The present invention relates to a data transmission system and method, which process real-time data such as video data, audio data, and the like as packet stream data.
In recent years, along with the development of the computer network techniques and the communication techniques of digital information, a data transfer system for transferring a time-serial data sequence (to be referred to as stream data hereinafter) to realize, e.g., video on demand, has been developed.
In such data transfer system, a video or audio signal of, e.g., a requested movie is converted into digital information to form stream data, which is transmitted from a transmitting system to a receiving system via a network.
In general, in order to distribute data such as video data including audio data via a network, moving image coding such as MPEG (Moving Picture Experts Group), Motion-JPEG (Joint Photographic Coding Experts Group), H.261, or the like is used, and such data is transmitted as an encoded packet stream.
When multimedia data such as audio data, video data, and the like is transferred in real-time on a network shared by a plurality of users, since the required network bandwidth cannot always be assured, the bit rate of an encoded video stream is reduced in correspondence with the available network bandwidth upon transferring the stream.
In one conventional bit rate reduction method, coding processing and stream transfer are performed parallel to each other, and when the bit rate control is required, an encoder is feedback-controlled to adjust the coding parameters (image quality, the number of frames, and the like). However, this method cannot be applied to an on-demand video communication system that transmits pre-stored streams as encoded packet streams since the encoder and data transmission unit depend on each other in the system arrangement.
In the following description, a system in which an encode unit and a data transfer unit are independent from each other, and the data transfer unit reduces the bit rate of an encoded stream will be exemplified. Note that reducing the bit rate of an encoded stream in corresponded that available network bandwidth will be referred to as stream shaping processing hereinafter.
In general, three following stream shaping methods (1) to (3) for video streams are available:
(1) To reduce the number of display frames (time resolution). PA1 (2) To reduce the image size (width, height) (spatial resolution). PA1 (3) To reduce the number of bits per pixel of an original image (the number of gradation levels or color resolution). PA1 input means for inputting an encoded stream which is packetized in units of abandonable data, and in which a header including a packet identifier is added to each of packets; PA1 transmission means for transmitting the encoded stream input from the input means onto an network; PA1 designation means for designating a bit rate; and PA1 control means for controlling a bit rate upon transmission of the encoded stream by the transmission means by abandoning a specific packet using packet priority determined on the basis of the packet identifier included in the header of each packet in accordance with the bit rate designated by the designation means.
For example, when the time resolution of a stream (e.g., Motion-JPEG) which has been intraframe-encoded at 30 frames/sec is halved by the method of reducing the number of display frames to transmit a stream at 15 frames/sec, the boundary of encoded data frame data is detected by stream analysis during the data transmission processing, and data are alternately transmitted and abandoned for every other frames.
On the other hand, when the method of reducing the image size (width, height), i.e., the method of changing the spatial resolution is used, the individual frame data in a stream must be encoded while being separated into low-resolution data and high-resolution expanded data for compensating the low-resolution data (i.e., a hierarchically encoded stream), and the bit rate is reduced by abandoning high-resolution expanded data upon data transfer. In this case, stream shaping as a combination of the spatial resolution or time resolution can be attained.
The term "scalability" in moving image coding means that two or more images having different spatial resolutions and time resolutions can be decoded from a single bitstream. The above-mentioned bit rate control utilizes the scalability of an encoded stream.
MPEG as the international standards of moving image coding describes an encoded stream having a scalability function (to be referred to as a scalably encoded stream hereinafter), but such stream realizes the scalability in a decoder and does not take scalability upon stream transmission into consideration.
For example, MPEG defines the multiplexing method of video and audio data. However, it is not easy to separate a data portion corresponding to a specific picture (e.g., B picture) in a video stream from a multiplexed stream.
As for a method that handles an MPEG video stream, a method of setting a plurality of levels in correspondence with the picture types is known. This method designates one of all data distribute level, B picture abandon level, B, P picture abandon level, and all video abandon level (distribute audio data alone); it allows only discrete rate control.
In a multiplexed stream of video and audio data, the rate control for preferentially transmitting audio data whose quality may deteriorate upon sub-sampling or decimation, and starting to reduce the bit rate from a video stream is known. However, the importances of video and audio data differ depending on the video contents and user's requirements. Also, since a stream that multiplexes a plurality of video data must often be processed, it is a very serious restraint for the user if he or she cannot set the policy of the bit rate control.
If the bit rate control in a relay node or multicast video distribution is taken into consideration, the bit rate control method is preferably independent from the coding scheme. For this purpose, a stream structure used in the bit rate control method must be easily expanded from typical moving image coding schemes such as MPEG, Motion-JPEG, H.261, and the like, and must be suitable for the bit rate control after coding. However, since the conventional method defines a special stream structure format depending on the moving image coding scheme used, and performs the bit rate control using the defined stream, the expandability of the stream structure is limited.