The 3GPP (3rd Generation Partnership Project) adopts IETF (Internet Engineering Task Force) standardized protocols like RTP, UDP, IP for the transport and packet-switch codecs like AMR (Adaptive Multi-Rate) and H.264 (MPEG 4 part 10) for encoding media. The 3GPP Packet Switched Streaming Services (see “Universal Mobile Telecommunications System (UMTS); Transparent end-to-end streaming service; Protocols and codecs”, 3GPP TS 26.234 version 5.6.0 Release 5, September 2003, available at http://www.3gpp.org) use the RTP/UDP protocol stack to stream audio/video/text media.
RTP is a Real-Time Transport Protocol (see Schulzrinne et al., “RTP: A Transport Protocol for Real-Time Applicatons”, RFC 3550, July 2003, all RFCs available at http://www.ietf.org) which is mainly used for real-time or near real-time communication, i.e. communication with relaxed delay constraints. It provides information on the timing of the media it carries and also allows re-ordering and re-assembling at the receiver.
An integrated part of the protocol is RTCP (Real Time Control Protocol) which provides minimal reception information and loose group membership. RTP is generally used together with the RTP/AVP profile (see Schulzrinne et al., “RTP Profile for Audio and Video Conferences with Minimal Control”, RFC 3551, July 2003) which defines the use of the RTP header fields and mapping tables for payload types, besides simple RTCP feedback timing rules.
UDP (Postel, “User Datagram Protocol”, RFC 768, August 1980) is the User Datagram Protocol used to transport RTP packets. UDP is commonly used when an unreliable communication is appropriate for the given media, as is the case for streaming applications. The protocol stack RTP/UDP is used because the timing constraints of the media don't usually allow reliable communication, e.g. by using TCP (Transmission Control Protocol).
In RTP, packetization schemes (payload formats) for existing media formats (codecs) are specified in the Internet Engineering Task Force Audio/Video Transport Working Group (IETF AVT WG). There is, for example, a payload format for AMR encoded speech data, and another one for H.264 video.
The payload format defined in Helistrom, “RTP Payload for Text Conversation”, RFC 2793, May 2000, may be used to transmit conversational text but the format does not allow carrying any additional information on the decoration of the text characters. The decoration is for example the font used, the background color, the scroll or the karaoke movement. It does not allow spatial synchronization with other media, like it is needed e.g. for subtitling of video sequences. In summary, the 3GPP timed text (see 3GPP TS 26.234, in particular Appendix D.8a) offers a much wider range of functionalities which is not supported by other standardized codecs.
Rey et al., “RTP Payload Format for 3GPP Timed Text”, draft-ray-avt-3gpp-tt-01.txt, IETF AVT WG, September 2003, available at http://www.ietf.org suggests a payload format for the transmission of timed text using RTP. However, the payload format provides solely means for out-of-band transmission of sample description information and does not address the problem of in-band transmission of sample description information in detail. The in-band transmission of sample description suggested by Rey et al. requires transmitting each sample description together with its associated text sample and therefore can not solve the problems outlined below.
In the present invention, in-band may be understood in the context of a signaling channel. In general, the sample descriptions represent pure signaling information or metadata. The text may be considered the actual data. Thus in-band means that the signaling, i.e. the sample descriptions, is transmitted in the same session as the data, i.e. the text samples. Please note that the text samples do not contain SPLDESC, THDR or FHDR headers, just text strings and modifier boxes (see 3GPP TS 26.234) are transmitted. Out-of-band signaling may be therefore understood as sending the sample description using another session or protocol than the one used for transmitting the data, e.g. SDP.
When streaming 3GPP Timed Text, it is typically the case that the steamed text samples refer to one and the same sample description entries. After a given amount of time all the possible sample descriptions have already been transmitted at least once. The text samples repeatedly refer to these sample descriptions and so the sample descriptions must be transferred over and over again from sender to receiver since the sender does not know which packets were received by the receiver. Further, 3GPP TS 26.234 and the method proposed In Rey et al., “RTP Payload Format for 3GPP Timed Text” both require that each text sample is always transmitted along with its associated sample description. Hence, in conventional systems the transmission overhead due to transmitting, all sample descriptions of a 3GPP file is large. Further, this overhead is especially undesirable in case of providing streaming to a mobile client over a radio link scarce in its resources.