Traditionally a video on demand service, such as the applicant's BT Vision service, is supported by encoding video at a constant bit rate and delivering it over a network at the same constant bit rate. This generally requires bandwidth reservation on the network, which can be expensive to provide.
Video encoded using compression techniques naturally has variable bit rate, as the number of bits produced when encoding a picture depends on the picture content: how similar it is to previously encoded pictures and how much detail it contains. Some video scenes can be coded to a given quality with a small number of bits, whereas other scenes may require significantly more bits to achieve the same quality. When constant bit rate encoding is used, video has to be coded at time varying quality to meet the bit rate constraint. This has been shown to be sub-optimal to the user, who would prefer to see constant quality. Also, by fixing the bit rate independent of the genre of the video content, some genres of content can be encoded well, such as news and drama, whereas others, such as fast moving sport and music videos and concerts, can only be coded quite poorly. Adaptive video delivery using variable bit rate encoding can be used to overcome these problems.
With an adaptive delivery system, the need for bandwidth reservation is removed, with the video delivery system adapting the bit rate of video delivered according to the available network throughput. Content can be encoded at a number bit rates corresponding to a number of quality levels, and delivered over the network without bandwidth reservation. Generally the video data would be delivered as fast as possible, while the quality level (encoded bit rate) is adapted according to the network throughput achieved so as to maximize the quality of the video delivered while ensuring that all video data is delivered over the network in time for it to be decoded and displayed without interruption.
International patent application WO 2009/112801 describes a variable bit rate encoding method that maintains a constant perceptual quality. Use is made of a perceptual quality metric (one that achieves a good correlation with actual viewer perception by taking into account masking effects) in a video encoder to encode with constant perceptual quality. Coding parameters, specifically the quantization parameter, is set separately for each frame taking into account masking effects based on relative contrast levels in each frame. The resulting encoded bitstream has a variable bit rate.
International patent application WO 2005/093995 describes a network with a video server connected to a number of client devices over a shared backhaul. Video content is encoded at a number of constant quality levels and the encoded bitstreams stored on a network based server. In response to requests from the clients, the encoded bitstreams are selected by the server and delivered over the shared network to the clients. Switching between the different bitstreams, and hence qualities, can be done depending on the actual network throughput, with an aim to maximize the quality of the stream.
However, when delivering video content that has been encoded at two or more quality levels, it is necessary to determine the minimum bit rate required to deliver the remainder of the video content at each of the available quality levels, so that a decision can be made as to whether to switch to a different quality bitstream depending on the actual network delivery rate.
One way to determine the minimum delivery bit rate for a given video stream is to analyze the statistics of the encoded video streams prior to commencing delivery. Thus, for a plurality of positions within each video stream, pairs of data can be pre-calculated, each pair containing a delivery bit rate and the minimum start-up delay that would be required if that delivery rate were to be used for timely delivery of the remainder of the given video stream. This data is then used during the subsequent streaming process to determine whether a switch can be made to a different quality stream, based on the amount of data already buffered at the receiver and the actual network delivery rate. Preferably, the quality of the stream selected is as high as the network delivery rate can support.
It is the aim of embodiments of the present invention to provide an improved method of streaming a video sequence over a network. In particular where the video sequence is encoded at a number of different quality levels, and embodiments of the present invention aim to provide an improved method of determining when a switch can be made to a particular quality level of video sequence to switch to, while ensuring timely delivery of the video sequence.
According to one aspect of the present invention, there is provided a method of transmitting a media sequence from a server to a receiver over a transmission link in a network, comprising:                (a) encoding a media sequence at a first and a second quality levels to generate a respective first encoded sequence and second encoded sequence, wherein the first quality level is lower than the second quality level;        (b) delivering the first encoded sequence to the receiver;        (c) determining a temporal position in the first encoded sequence at which to switch from transmission of the first encoded sequence to the second encoded sequence while ensuring timely delivery of the media sequence;        wherein said temporal position is dependent on the position when the difference in the cumulative bit counts between the second encoded sequence and the first encoded sequence is greater than the predicted throughput over the transmission link multiplied by the difference in a first preload and a second preload, wherein the first preload is the playout duration of the data buffered at the receiver needed to deliver the second encoded sequence at the predicted throughput from the current temporal position, and the second preload is the playout duration of the data presently buffered at the receiver.        
Preferably, if there is a plurality of positions, then the temporal position is determined as the first of such positions.
The method may comprise a further step of                d) switching from the first encoded sequence to the second encoded sequence when transmission of the first encoded sequence reaches the determined temporal position.        
The media sequence is preferably a video sequence. The quality levels may be fixed for each encoded sequence. In preferred embodiments, the quality levels are perceptual quality levels.
Preferably, the receiver performs the determining step.
The temporal position is preferably a position of a group of pictures.
Lower and upper predicted throughput values may be used to determine respective later and earlier temporal positions, and deciding to switch from the first to the second encoded sequence based on the earlier and later temporal positions.
The predicted throughout may be based on past throughput over the transmission link.
According to a second aspect of the present invention, there is provided a method of streaming a media sequence from a server to a receiver over a transmission link in a network, comprising:                (a) receiving a first encoded sequence, said first encoded sequence representing a media sequence encoded at a first quality level;        (b) determining a temporal position in the first encoded sequence at which to switch from transmission of the first encoded sequence to a second encoded sequence while ensuring timely delivery of the sequence, wherein the second encoded sequence represents the media sequence encoded at a second quality level, said first quality level being lower than said second quality level; and        wherein said temporal position is dependent on the position when the difference in the cumulative bit counts between the second encoded sequence and the first encoded sequence is greater than the predicted throughput over the transmission link multiplied by the difference in a first preload and a second preload, wherein the first preload is the playout duration of the data buffered at the receiver needed to deliver the second encoded sequence at the predicted throughput from the current temporal position, and the second preload is the playout duration of the data presently buffered at the receiver.        