Streaming media (e.g., video and/or audio) to mobile devices such as cellular telephones, personal digital assistants (PDAs), etc., is an important emerging market. For example, in order to lessen a user's perceived duration of being “on-hold,” a customized movie trailer or other interesting video content may be streamed to the mobile device. Also, being able to select and watch professionally produced material such as sports, or movies while away from a home or office environment greatly expands the market for video on demand (VoD). Additionally, adding a video back channel (e.g., expressions, gestures, postures, etc.) lessens the perceived gap between remote and local interactions.
As seen with the revenue generated by the ring-back tones and the push-to-talk markets, new telco-services markets can be quite large and can occur in unexpected areas. The promise of fresh markets is one of the driving forces behind the use of third generation (3G) wireless standards in Asia and the move to these standards in Europe. Even in the United States, where 2.5G/3G adoption lags other parts of the world, there have already been some streaming video products for consumer handsets (e.g., cellular telephones).
Unfortunately, the promise of these markets is largely unrealized to date, due to two major barriers. One barrier is the fact that many of the mobile devices which access streaming media have limited capabilities (e.g., limited processing and/or memory capacities) and offer restricted interfaces to the end-user. Thus, these mobile devices lack the resources to effectively operate complex media presentation applications which can be utilized, for example, on a user's home computer, or to store large amounts of data. The restricted interfaces typically found in these devices limits how the user navigates and selects a plurality of options (e.g., voice mail options, or a list of movies which may be accessed). As a result, the user typically must wait until the options, which are presented serially over an audio interface, are presented. Another drawback to these interfaces is that the user may be required to enter a complex keying sequence to indicate a selected action.
Another barrier to these markets is the difficulty in creating new interactive applications that conform to telecommunication network requirements, for example, 3rd Generation Partnership Project (3GPP) compliant networks. This problem has been side-stepped using Internet-enabled handsets that can provide Internet-style streaming video within a telecommunications infrastructure: i-mode, 3GPP, Wireless Access Protocol (WAP), and Packet-Streaming Service (PSS). However, using Internet-based protocols for 3G video can create unnatural and unnecessary dividing lines in the consumer's interaction with video content, be it person-to-person connections, or connections with services such as Video-on-Demand, or Video Mail.
Typically, when a cellular telephone establishes communications with the network, the SIP description of the session channels and their aggregate bandwidth comprises a contract. Thus, the telecommunications network will not place the call unless there is a reasonable expectation that the contracted bandwidth will be available between the two end points for the duration of the call. The contract defines the maximum bandwidth and maximum frame rate that can be used to send, for example, video data packets. Additionally, the telecommunication channels are highly optimized to minimize latency, thereby enabling not only conversational speech, but also fast-response interactive applications. In contrast, an Internet-oriented approach only provides a “best effort” delivery of the communication channels and does not guarantee a specific bandwidth or latency.
While a telephony-oriented approach may be more desirable, there are constraints which make interactive streaming content difficult to implement. For example, when using controls on the client side to speed-up, or “sample,” a video display, the increased frame rate necessitated to speed-up the video can exceed the maximum frame rate and/or the maximum bandwidth defined in the SIP contract. In another example, the original source material (e.g., a television broadcast) may exceed the maximum frame rate and/or maximum bandwidth defined in the SIP contract. As a result, it may be necessary to drop video frames and/or reduce the bandwidth in order to comply with the SIP contract. If the video codec relies upon uniform frame-spacing, one approach to maintain the pre-defined frame rate is to drop a certain percentage of the video frames in order to comply with the SIP contract. However, if the video codec allows the use of non-uniform spacing of the video frames, relying on this method typically results is less than satisfactory video quality.
Predictive coding of video frames also complicates which frames are to be dropped. In media that complies with the Moving Pictures Expert Group (MPEG) standard, some frames only comprise data that is changed from some preceding frame. Thus, intra-frames (I-frames) are single frames of content that are independent of the frames that precede and follow it and stores all of the data needed to display that particular frame. Predictive frames (P-frames) and bi-directional frames (B-frames) only comprise the data that has changed from the preceding frame (e.g., in the case of a P-frame) or data that is different from the preceding frame and the following frame (e.g., in the case of B-frames). Thus, dropping an I-frame has a significantly greater effect on video quality than dropping a P-frame, or a B-frame.
Further complicating the task of frame-rate and/or bandwidth reduction is the fact that interactive user controls allow the user to change the display rate. For example, a user may speed-up the display from real-time to 1.5× real-time. The user may then speed-up the display further to 2.0× real-time to find a desired portion of the video. The user may then slow the video display back down to real-time to view the desired portion. Thus, the processing that determines which frames to drop must be capable of dynamically changing the frame-rate of the video without significantly degrading the video quality.