The OpenMAX Integration Layer (IL) API (application programming interface) is an open standard developed by the Khronos Group for providing a low-level interface for audio, video, imaging and timed-text media components used in embedded and/or mobile devices. The principal goal of the OpenMAX IL is to give media components a degree of system abstraction for the purpose of portability across an array of different hardware and software platforms. The interface abstracts the hardware and software architecture in the system. Each media component and relevant transform is encapsulated in a component interface. The OpenMAX IL API allows the user to load, control, connect, and unload the individual components, enabling easy implementation of almost any media use scenario and meshing with existing graph-based media frameworks.
The OpenMAX IL API defines media components such as audio/video/image decoders/encoders, audio/video/image readers/writers, audio renderers, video schedulers, container demuxers/muxers, clocks, audio/video/image processors and the like. The OpenMAX IL API allows a client such as an application or media framework to create a media processing chain by connecting together various components. Content data is typically fed into the chain at one end and sequentially processed by each component in the chain. The data is transported between components using ports and buffers.
The OpenMAX IL API also defines an interface for accessing data from a local file or from a remote location. This concept is referred to as a content pipe and is described in chapter 9 of the OpenMAX IL API specification. A ‘content pipe’ is an abstraction for any mechanism of accessing content data (i.e. pulling content data in or pushing content data out). This abstraction is not tied to any particular implementation. Instead, a content pipe may be implemented, for example, as a local file, a remote file, a broadcast, multicast, or unicast stream, memory buffers, intermediate data derived from persistent data, etc. Moreover, a content pipe need not be limited to a single method of providing access. For example, a single pipe may provide content via both local files and remote files, or through multiple transport protocols. A system may include one or many content pipes.
There are various methods for operating a content pipe such as creating a content pipe based on a URI (uniform resource identifier), reading/writing a number of bytes from/to the content pipe and setting/getting byte position inside the content. In addition, asynchronous methods can be used for remote access such as by checking available bytes, getting a large buffer from the content pipe that the content pipe user can read from and providing a large buffer to write to the content. In each case, the OpenMAX IL API essentially models content pipe access like traditional file access.
One mechanism for remotely controlling the delivery of media content is the real-time streaming protocol (RTSP) defined by the IETF (Internet Engineering Task Force) in RFC 2326. RTSP is a client-server text-based protocol that enables a client to remotely control a streaming server. The client transmits RTSP method requests and the server replies with RTSP method responses. Typical RTSP commands include DESCRIBE, SETUP and PLAY. The packet-switched streaming service (PSS) is defined by 3GPP and is based on RTSP, but defines a complete service for streaming. To establish a streaming session, the streaming client needs a session description. A streaming session is defined via the session description protocol (SDP), which may be obtained in practice from either an .sdp file downloaded from, e.g., a WAP (wireless access protocol) page, or an SDP retrieved in a response from a streaming server to the use by a client of the DESCRIBE command towards an RTSP URI (e.g. rtsp://server.com/clip). The SDP information includes configuration parameters for the streaming session and for the corresponding media streams and decoding thereof.
A media stream (e.g., audio, video, images and/or timed-text) is established when a client requests the server to set up an RTP (real-time protocol) connection with the client, the media format being described in the SDP. Thus, RTSP is used to establish the streaming session and to control the server while RTP is used to carry the actual media content once the streaming session is established. A typical streaming client has one TCP (transmission control protocol) connection for RTSP signaling. In addition, for each media type that the session includes, the streaming client will have two UDP (user datagram protocol) connections. The first UDP connection is used for reception of RTP traffic, and the second UDP connection is used for exchange of RTCP (real-time control protocol) packets (both RTP and RTCP are carried over UDP). RTCP packets are sent by both the server and the client, enabling both devices to give feedback about the RTP transmission progress.
RTP packets include payload data, typically encoded media frame data provided in a format favorable for streaming. Typically, the payload data may need some processing (e.g., “de-packetization”) before the coded frame can be sent to the media decoder. De-packetization involves extracting the encoded media frame data by removing the packet header information and other packet encapsulation information. RTP packets also include a time stamp which indicates when the content of the frame was sampled relative to other frames in the same stream. The timestamp information, together with inter-media synchronization information transmitted by the server (which is received in either an RTSP message or an RTCP message), can be used to establish the local time of the client at which each frame should be rendered and presented. This way, the client can maintain synchronization between different media streams. The streaming client also typically deploys jitter buffers that hold some RTP data before decoding and rendering. Buffering the RTP data enables the client to account for variations in transmission delays that arise from the server to the client. Buffering is also used to reorder packets that arrive out of sequence during a streaming media session.
Real Media streaming is a type of media streaming that differs from 3GPP streaming. Real Media streaming uses only one UDP connection to carry multiple streams, unlike 3GPP streaming which uses multiple UDP connections. Also, media packets are distinguished by a stream identifier. Real Media streaming uses a proprietary transport format called Real Data Transport (RDT). With Real Media streaming, it is possible to use a proprietary mechanism for feeding back information to the streaming server, but the feedback mechanism does not require a separate UDP connection. Thus, Real Media streaming only requires one UDP port in total.
Windows Media streaming is yet another way of transporting streaming media content. Windows Media streaming uses RTP to transport an Advanced Systems Format (ASF) file to the client. The ASF file is a container format which holds frames for all media types and streams. Windows Media streaming thus also uses only one UDP connection to carry all media content. As such, both Real Media and Windows Media streaming need some form of de-multiplexing before the media content can be de-packetized and decoded.
MBMS (multimedia broadcast and multicast service) is a mechanism for remotely delivering media content to a client in a cellular environment. MBMS defines a bearer service and a user service. The bearer service allows efficient use of broadcast or multicast bearers in the cellular environment. Traditionally, bearers over cellular networks are bidirectional point-to-point bearers. MBMS allows for the setup of unidirectional downlink bearers to multiple receivers. The MBMS User Service allows streaming and downloading of multimedia content over unicast, multicast, or broadcast bearers. Mobile TV services can be realized over MBMS User Service using the streaming protocols defined in the 3GPP TS 26.346 specification. MBMS streaming uses RTP for transporting multimedia data and mobile TV sessions are described using SDP. MBMS protocols and codecs are aligned with PSS. However, RTSP is not used when only unidirectional bearers are employed.
DVB-H (Digital Video Broadcasting-Handheld) is another way to remotely deliver media content to a client in a wireless environment. As its name indicates, DVB-H is the handheld version of a broadcast standard which includes the well-known satellite (DVB-S), terrestrial (DVB-T) and cable (DVB-C) versions. DVB-H was specified by the DVB project and subsequently endorsed by regional standardization bodies (e.g., ETSI EN 302 304). DVB-H is an adaptation of DVB-T that takes into account the specific requirement of handheld devices with respect to power consumption, processing capabilities and multimedia rendering capabilities. Mobile TV services over DVB-H use the DVB-IPDC service layer where IPDC stands for IP datacasting. The DVB-IPDC service layer describes the Electronic Service Guide (ESG), the content delivery protocols (CDP), and service purchase and protection (SPP). An alternative service layer to DVB-IPDC is OMA BCAST. The transport protocol for DVB-IPDC is RTP and mobile TV sessions are described using SDP. RTSP is not used with DVB-H because of its unidirectional nature.
Still another media content distribution technology is MTSI (Multimedia Telephony Service over IMS), where IMS stands for IP Multimedia Subsystem. MTSI is specified by 3GPP. MTSI is an evolution of traditional telephony and Voice over IP (VoIP), whereby traditional speech telephony calls are enriched with multimedia content such as video and text and during which users can share multimedia files (e.g. images and video clips). MTSI protocols are based on IMS protocols for session description and control (Session Initiation Protocol (SIP) and SDP). MTSI uses RTP to transport multimedia content between parties.
OpenMAX only defines playback of media content from a local or remote location (i.e., a file), but it does not address media carried over RTP. The OpenMAX IL API does not playback and record from/to transport over RTP. Thus, OpenMAX is generally incapable of directly supporting streaming, including streaming implemented using RTSP and RTP. Further, as shown below, attempts to implement streaming using OpenMAX IL content pipes are unlikely to work correctly.
There are at least two general ways to support PSS with OpenMAX IL. One way is to bridge OpenMAX IL to the network stack with the IL client and/or application. Another way is to use “content pipes” as suggested in the OpenMAX IL specification. OpenMAX IL can handle PSS using the client application as a bridge. The application has control of the server via an RTSP control interface. The client also has control of the media decoders via the IL client (e.g. OpenMAX AL or any other multimedia framework) and the OpenMAX IL API. RTP/RTSP (e.g., control, RTP buffers, RTCP, etc.) function outside the OMX IL implementation. Media decoders, synchronization and rendering are performed within the OpenMAX IL implementation.
Once a PSS session is established, the IL client can setup the decoder and renderer components. Data in the form of audio/video stream buffers are transferred from the RTP stack to the multimedia decoders via the application or the IL client. Timestamps must be added to the audio and video flows for proper synchronization in OpenMAX IL. However, functions such as seeking, play and pause are not possible from within the OMX IL interface with such an approach. In addition, the IL client must introduce time stamps in payloads from the RTP stack to feed into the OMX IL filter graph. Moreover, RTP and RTSP are not integrated into OpenMAX, even though they are a part of the player chain.
Streaming may also be handled in Open MAX IL using content pipes. The IL client can create its own custom content pipes. In this case, each custom content pipe can be used to transfer audio and video streams from RTP to OMX IL decoders as described above. However, it is unclear what type of component can leverage the content pipe because decoders cannot be directly connected to a content pipe according to OpenMAX. The IL client can also provide a streaming URI to a reader/demuxer component. The reader/demuxer would then create its own content pipe using the URI provided. The implementation of the content pipe in theory can then provide all streaming functionalities. However, with only one content pipe opened for the streaming session, all audio and video streams would go through that pipe. OpenMAX does not define how such multiplexed media data would be subsequently formatted and de-multiplexed. Also, no control is given via the content pipe to control the streaming session. Moreover, synchronization information coming from RTP must be translated into timestamps, but this is not currently supported in OpenMAX. OpenMAX also does not define how to retrieve the information about content format (i.e. the SDP) via the content pipe to setup the correct decoders. Thus, there is no benefit for using a content pipe for processing PSS media data in view of the current OpenMAX standard.
OpenMAX primarily focuses on the media plane. For RTSP streaming and other datacom protocols, a control plane towards the server is also needed. This not addressed by OpenMAX. Further, the content pipe concept is modeled after file access (i.e., read (a number of) bytes, and all media data is indexed by bytes). A streaming server cannot be controlled according to how many bytes the client reads from the incoming RTP packets. The streaming server must be controlled via RTSP. In addition, RTSP typically uses time, measured in seconds (normal play time—NPT), as the index, not bytes.
Moreover, it is also unclear how seeking should be performed in OpenMAX with an RTP source. If the (potential) demuxer is set to a new position, OpenMAX does not define how the streaming server should be informed when the RTSP implementation is in a content pipe. As noted above, content pipes only index bytes, not NPT as in RTSP. Further, synchronization in OpenMAX is not defined for RTP-based protocols. With streaming, the synchronization information may come from RTCP or from RTSP. This information must be at hand for the component handling the time-stamping.
Yet another issue is how multiple streams (as in 3GPP streaming) and multiplexed streams (as in Real and WMT streaming) are handled by the same client. For example, it is not clear how many content pipes should be used in such a scenario. If only one content pipe is used, 3GPP streams would have to be multiplexed before the content pipe. If more than one content pipe is used, Real and WMT streams would have to be de-multiplexed before the content pipes.
Also, there is no payload handling required for playback from a file, as the reader or demuxer is already able to locate and extract the coded media frames in the file. The decoder could possibly be responsible for RTP payload handling in the streaming case, but it would be seen as something outside the typical responsibility of a decoder (and might contradict the input format for OpenMAX IL decoders). The demuxer/reader could also possibly be responsible for RTP payload handling in the streaming case, but it would be less flexible. It would be desired to be able to route RTP packets between OpenMAX IL components, e.g., for recording.
Finally, MTSI (which provides speech-related services) requires very short delays in the buffers and must be able to handle variations of the amount of data in the jitter buffers. Again, an RTP element is required to perform rate adaptation, time-stamping, and de-packetization. A jitter buffer is inserted and the time-scaler unit placed after the speech decoder requires a fast two-way control interface with the jitter buffer to allow for minimization of speech path delay.