1. Field of the Invention
The present invention relates to technology for computer-based streaming of media data which can be used in a live-rendering destination.
2. Description of Related Art
One use of networks like the Internet involves delivering media data, such as audio and video, from a server to a client, where the client can render the data for playback, including live rendering as the data is streamed. In some settings, the client and server are configured to send an encoded media stream from the server to the client with transport controls (play, pause, position, etc.) so the user can play any part of the stream with minimal delay.
The stream may be pre-existing (e.g. a file on disk), generated in real-time (e.g. video from a live event), or generated as needed (e.g. the stream contents are generated based on user interaction, and parts of the stream may never be generated if the client does not request them). Normally the data will be encoded (e.g. MP3 for audio, H.264 for video) to reduce the total amount of data that needs to be transferred, with a corresponding decoding required for playback.
One possible approach would be to send data for the whole stream in advance to the client. Playback is then handled completely by the client so can be very responsive, but there is a long initial delay (latency) while the whole stream is sent, and another long delay if the stream contents change and need to be sent again.
Another possible approach is to send data only when needed for playback. If the transport position changes, the server can send stream data starting at that position to the client, sending more as needed as the playback position advances. This is how video streaming on the web commonly works. Storage requirements on the client side are minimized, but there is a short buffering delay before playback can start at a new position, and potentially the same data is sent multiple times if the user wants to play part of the stream repeatedly.
A web browser may have limited facilities for handling and decoding compressed stream data, for example a decoder may only be able to decode a complete media stream rather than the parts of a stream that have arrived at the client so far. The decoder may also corrupt the start or end of the stream, add or remove part of the length, and apply a time offset and/or time scaling to the decoded data.
In interactive environments, situations occur where parts of the stream are likely to be repeated, and where the contents of the stream may change. Treating the limited facilities for decoding compressed stream data on the client side as a “black box” as can be a practical requirement for Internet based streaming systems, may introduce a variety of unwanted effects.
Taking the example of MP3 encoded audio, nominally 1152 uncompressed audio signal samples are compressed to 1 encoded frame of MP3 data. Encoded frames of MP3 data are sent to the client and decoded back to audio data. However, individual frames cannot be decoded successfully without the context of the surrounding frames.
Consider an audio stream, where the first 11520 audio samples are compressed to 10 MP3 frames which are transferred to the client and passed to the black-box MP3 decoder. The decoder outputs 11600 samples of audio data instead of the expected 11520. What has happened? Typically the start of the audio data will be silent (as there was no previous input context for the decoder, and some internal buffering needs to take place before output can be produced), there may be a short fade-in at the start of the data, a short fade-out at the end, maybe followed by some more silence. Perhaps because of the wanted 11520 samples of audio, only samples 1000 to 10000 have been decoded correctly and are available from frame 1152 onwards in the decoded data. The exact lengths of fades and silences will depend on the implementation of the decoder, and may also produce different results for the same audio data compressed by different encoders. While the resulting decoded data may have an offset in time relative to the original (usually negligible for the purposes of playback positioning), for any reasonable decoder this offset will be constant for a given input stream.
So a live streaming configuration may not be able to correctly account for variations in the decoding performance of the wide variety of decoders used in the network, when it treats the decoders as a “black-box.”
It is desirable to provide an efficient and flexible scheme for streaming and buffering media data in an interactive environment.