1. Field of the Invention
The present invention relates to encoders for encoding video data such as video streams of the MPEG type. In particular, the present invention relates to encoders that are adapted for delivering the same encoded video stream to a plurality of clients.
2. Description of the Related Art
FIG. 1 is a schematic diagram of a conventional system that allows the delivery of a video stream from a server device 100 to a receiving device 110 referred to as a client. The video stream, obtained from an external source such as a camcoder or from a storage memory, is composed of a plurality of (uncoded) frames. A frame represents typically one image, but it can also represent only part of an image or a plurality of images.
The server 100 includes an encoder 101 that encodes video frames, a packetizer 102 that packetizes the encoded video frames (bitstream) into data packets adapted to the transport protocol in use in the communication network 120 (for example the data packets may represent payload data of “Real-Time Protocol” or RTP packets, headers of the RTP packets will be added just before the sending on the network 120), a transmission buffer 103 for serving those data packets and a packet scheduler 104 in charge of scheduling the sending of the data packets over the network considering the available bandwidth.
The server 100 furthermore includes a network monitor module 106 that estimates the network characteristics based on end-to-end measurements and provides feedbacks to the packet scheduler module 104 and to a frame layer rate controller 105 for properly adjusting the packet sending rate. The network monitor 106 can measure the round-trip time (RTT) or the relative one-way trip time (RoTT) which represents the necessary time for a packet to reach the client after being sent by the server. Also, the network monitor 106 can implement for example the known TCP Friendly Rate Control algorithm (TFRC) that makes it possible to estimate the available bandwidth B(t) between the server and the client from measured round-trip time (RTT) and loss rate. Another example of an algorithm that can be implemented is the Additive Increase—Multiplicative Decrease (AIMD) congestion control algorithm that enables to estimate the available bandwidth B(t) from the measured round-trip time (RTT) and the size of a congestion window.
The frame layer rate controller 105 aims at controlling the rate of the bitstream outputted from the video encoder 101, the objective being to meet the constraints imposed by the connection established between the server and the client. This control is performed on a frame by frame basis. For each new frame to encode, the frame layer rate controller 105 sets a target size S for the resulting encoded frame. The video encoder 101 adjusts accordingly its compression parameters so that the actual frame size of the encoded frame be as close as possible to the target size S.
A first constraint of the connection that is to be met is keeping the sending rate out of the server below the network capacity B(t) as estimated by the network monitor 106. The bandwidth B(t) fluctuates over the time and needs to be estimated at regular intervals.
Another constraint that is usually considered is to have a constant video quality over a predetermined number of frames. When the encoder 101 applies predictive compression using motion compensation according to one of the formats MPEG2, MPEG4 part 2 or H264, the encoded frames are of two types: predicted frames (either predicted from one reference frame called P-frames or predicted from two reference frames called B-frames) and non predicted frames (called Intra frames or I-frames). For these types of encoding, it may be sufficient to have a constant quality over only one group of pictures (GOP). The GOP is defined as a set of frames that contains only one I-frame, which is the first encoded frame of the set, and only P or B frames that refer directly or indirectly to that I-frame. A GOP size can be 12 or 20 frames.
Yet a further constraint that applies for real-time video streaming is to have a bounded transmission delay D between the server and the client to guarantee timely display of video frames at the client. This constraint is particularly important for low latency applications like video-conference applications that require typically a maximum delay of 250 ms between image capturing or compression at the server side and display at the client side.
It is known, for example from “Delay-constrained TCP-compatible rate control for real-time video transmission over the Internet” published in “Annales des télécommunications” on January 2002, to use a rate control algorithm for choosing the encoding size of each new frame by taking into account the transmission delay and the available bandwidth.
When the same video encoded bitstream is to be delivered to a plurality of clients instead of one given client, the setting of the target size by the frame layer rate control needs to take into account the constraints imposed by all the connections established between the server and the plurality of clients.
This is typically the case in a multi-unicast system where the server includes one encoder for encoding the video frames, one packetizer that packetizes the encoded video frames but as many transmission buffers, network monitors and packet schedulers as there are clients to be served. In such a system, the way the constraints imposed by the different connections are taken into account needs to be addressed.
In article entitled “Extending Equation-based Congestion Control to Multicast Applications” published in the Proceedings of SIGCOMM'2001, San Diego, Calif., on August 2001, a sender-driven congestion control protocol is described that extends the TFRC congestion control mechanism into multicast, referred to as TFMCC. In order to avoid the server from being flooded by feedbacks from all clients, one client is designated as the current limiting receiver (CLR). This client can send feedbacks (measured RTT, loss rate and estimated bandwidth) without any limitation, whereas other clients can send feedbacks only after a random delay if their estimated bandwidth is lower. The server echoes received feedbacks in the multicast group and the CLR changes if another client sends feedbacks showing a lower rate.
This proposed approach does not provide however satisfactory results because the congestion control algorithm is based on the most limiting client in terms of bandwidth. Clients with high capacity connections are still served based on the capacity of the connection having the lower rate.
Another approach known in the prior art is the implementation of a video scalable encoder (cf. for example article “Scalable Video Conferencing using Sub-band transform Coding and Layered Multicast Transmission” Proceedings of ICSPAT'99, Orlando, Fla., October 1999). The usage of a scalable bitstream allows to adapt the sending rate to the capacity of the different connections. Scalable video encoding and the associated decoding are however more sophisticated and their implementation is more complex. They also require more processing power.
Therefore, there is a need to be able to provide a more efficient bandwidth utilization when serving a bitstream generated by one video encoder towards a plurality of clients over connections with different parameters, without having to implement complex encodings like scalable encoding.