The present disclosure relates to real-time media streaming, where a server streams media content such as audio and/or video over a network to a client and the client plays out the received media stream in a continuous manner as the client receives the media. With improvements in network technology, the popularity of real-time media streaming has grown dramatically over recent years. It is now commonplace for users of personal computers and handheld wireless devices to watch movies or video clips streamed over the Internet in real-time and to listen to music streamed over the Internet in real-time. Watching or listening to media streamed in real-time over the Internet avoids the need to download a complete media file in advance, thereby conserving device memory and allowing for more of an on-demand media experience.
The well known Real Time Streaming Protocol (RTSP), defined by Request For Comments (RFC) 2326, published by The Internet Society in April 1998, defines industry standard procedures for handling media streaming over Internet Protocol (IP), and the present document will refer to certain procedures defined by RTSP. However, those of ordinary skill in the art will readily appreciate that other baseline streaming procedures, including variations of the RTSP procedures or other procedures altogether (whether or not using terminology similar to RTSP), could be used instead. Therefore, reference to RTSP (and/or its messages) in this document should not be viewed as limiting but should rather be viewed as representative.
In typical practice, to initiate a streaming media session, a user of a client device will select a media object or “channel,” such as by clicking on a link representing the channel (in a channel listing or on a web page for instance). In response, the client will then send to a host server a DESCRIBE request that designates the desired media object, typically by a Uniform Resource Locator (URL). Upon receipt of the DESCRIBE request, the server will then send to the client a reply message (such as a “200 OK” message) that includes a presentation description for the media session, typically in a Session Description Protocol (SDP) format. Among other information, the presentation description may list one or more media streams (such as an audio stream and a video stream) that would be provided in the requested session, as well as the applicable codecs and other initialization parameters for the session.
Once the client has received the presentation description, the client will then engage in a setup process for each designed media stream, to establish or agree with the host server on a transport mechanism through which each media stream will be transmitted from the server to the client. With respect to each media stream designated in the DESCRIBE reply, for instance, the client will typically send to the server a SETUP request that indicates a preferred transport protocol, such as Real-time Transport Protocol (RTP) (defined by RFC 3550, published by The Internet Society in July 2003), and that indicates the client ports (e.g., Transmission Control Protocol (TCP) port(s)) to be used for the media stream. In response to each SETUP request, the server may then send to the client a reply (such as a 200 OK message) that provides a session identifier and that designates the server port(s) to be used for the stream.
After agreeing on the transport mechanism, the client will then send one or more PLAY requests to the server, designating the session identifier and directing the server to start streaming the media via the agreed transport mechanism. And in response, the server will begin to stream the designated media to the client, for real-time playout by the client. In practice, if a session includes more than one media stream (e.g., a video stream and an audio stream), the client may send a single PLAY request that lists and requests playout of multiple streams of a session (e.g., both audio and video streams), or the client may send separate PLAY requests for the various streams. Further, a PLAY request may define a time range of the media stream to be transmitted, by specifying start and stop time points for instance, in which case the server would respond by streaming the designated portion of the media.
As with broadcast television channels and radio stations, it would be advantageous if a user receiving streaming media from a server could change channels, or switch between streaming media sessions, on demand. Further, it would be beneficial for a client device to be able to automatically switch from one media session to another in accordance with a playlist or the like.
Unfortunately, however, given the typical session initiation process as described above for instance, a user may experience a substantial delay when switching from one streaming media session to another. For instance, if the newly requested media session includes both audio and video streams, the back-and-forth initialization signaling that would occur between the client and server to set up the new session including each of those streams could take several seconds to complete.
Recognizing this problem at least in the context of mobile wireless devices, the Third Generation Partnership Project (3GPP) has developed a signaling process that can be used by a client to facilitate more seamless switching from one media session to another. In accordance with the signaling process, the client would forego sending any SETUP requests for the new media session but would rather simply send a PLAY request (or multiple PLAY requests) for the new session, including in the PLAY request one or more “fast-content-switching” tags (e.g., a “3gpp-switch” option tag) that directs the server to simply switch to streaming content of the new session in place of streaming content of the old session. In particular, for each stream in the new session that is of the same type as a stream in the old session (e.g., given an audio stream in the new session and an audio stream in the old session, or given a video stream in the new session and a video stream in the old session, etc.), the client would simply include in the PLAY request a fast-content-switching tag that designates the old stream's URL and the new stream's URL.
In response to a fast-content-switching request with respect to a given old stream and a given new stream, the server will effectively splice the new stream onto the old stream. Thus, after the client has played all of the buffered media for the old session (or has flushed its streaming media buffer to remove the data for the old session), the client could seamlessly begin playing media of the new session. Furthermore, in the server's reply to a PLAY request that contained one or more fast-content-switching tags, the server may include a parameter (such as a new Synchronization Source (SSRC) code) that, when detected by the client, will cause the client to update its descriptive data (such as channel name, etc.) for the stream or session and to more seamlessly switch from each old stream to each new stream.
Fast-content-switching assumes that the client can continue applying largely the same transport mechanism and, particularly, the same decoder(s) when switching from playout of streaming media in one session to playout of streaming media in a new session. For example, if the client is currently engaged in a streaming media session in which the client is receiving, decoding, and playing out an H.264 encoded video stream and an AAC encoded audio stream, then fast-content-switching would be supported for switching over to a new streaming media session that also provides an H.264 encoded video stream and an AAC encoded audio stream. In practice, the server would continue by splice the new H.264 video stream onto the old H.264 video stream, and the server would splice the new AAC encoded audio stream onto the old AAC encoded audio stream. As another example, if the client is currently engaged in a streaming media session in which the client is receiving a single stream defining interleaved H.263 video and MP3 audio, then fast-content-switching would be supported for switching over to a new streaming media session that also provides a stream defining interleaved H.263 video and MP3 audio. In that case, the server would continue by simply splicing the new H.263/MP3 stream onto the old H.263/MP3 stream.
With possibly some exceptions, fast-content-switching may not be supported if a stream of the newly requested session is of the same type as a stream of the old session but is not the same format as the old stream of that type. In particular, if the client would need to change its decoder state in order to begin playing the new stream in place of the old stream of the same type, then fast-content-switching may not be supported. In particular, given a new stream of the same type as a stream in the existing session, if the presentation description or SDP of the new stream is not compatible with the presentation description or SDP of the old stream, then fast-content-switching would not be supported.
(Note that if the newly requested media session includes a stream of a type that does not match the type of any stream in the existing session, fast-content-switching could still be supported for one or more other streams of the new session that each match a stream type in the existing session. For the new stream of the new type, the client may simply engage in a SETUP transaction with the server before requesting transmission of the new stream, thereby adding that new stream type into the ongoing session. Likewise, if the existing session includes a stream of a given type and the newly requested session does not include a stream of that type, fast-content-switching could still be supported for one or more other streams of the new session that each match a stream type in the existing session. For the no-longer-included stream type, the client may simply send to the server a Real Time Control Protocol (RTCP) “BYE” message to terminate the stream or may include a fast-content-switching tag that designates the old stream and designates no new stream, which would result in terminating the old stream.)
In accordance with the 3GPP recommendation, if a client sends a PLAY request that includes a fast-content-switching tag but the server then determines that the requested fast-content-switching is not supported (e.g., the server detects an incompatibility between the presentation description of the new session and a corresponding presentation description of the old session), the server will send to the client an error response (such as a “551 Not Supported” response). At that point, the client will then need to engage in a different process to set up the new session. Unfortunately, however, the client's failed attempt to use fast content switching will already have wasted valuable time, thereby increasing rather than decreasing the total time to switch from an existing streaming media session to a new streaming media session.