Embodiments of the current invention are related to media streaming and particularly to a system and method to optimize media streaming over one or more IP networks.
In the specification and claims which follow, the expression “media streaming” or “streaming” is intended to mean the transfer of video information (and any associated audio information, if applicable), as known in the art, typically from one or more of servers to a plurality of devices (typically called “receivers”) located at a distance from the respective servers. As such, terms such as “video content”, “content”, and “media stream” (or abbreviated “stream”) are used interchangeably in the specification and claims which follow hereinbelow to mean video content which is streamed. Typically, a stream comprises a plurality of “packets”, as known in the art and described further hereinbelow.
Other terms used in the specification hereinbelow, which are known in the art, include:                “Moving Picture Experts Group (MPEG)” is intended to mean a working group of experts, formed by ISO and IEC to set standards for audio and video compression and transmission;        “MPEG transport stream (TS)” is intended to mean a standard format for transmission and storage of audio, video, and program and system information protocol (PSIP) data. Transport Stream is specified in MPEG-2 Part 1, Systems (formally known as ISO/IEC standard 13818-1 or ITU-T Rec. H.222.0);        “TS Packet” is intended to mean the basic unit of data in a transport stream. “Program Clock Reference (PCR)” is intended to mean a value transmitted in the adaptation field of an MPEG-2 transport stream packet. PCR, when properly used, is used to generate a system_timing_clock in a decoder to present synchronized content, such as audio tracks matching the associated video, at least once each 100 ms;        “Presentation timestamp (PTS)” is intended to mean a timestamp metadata field in an MPEG transport stream or MPEG program stream that is used to achieve synchronization of programs separate elementary streams (i.e., video, audio, subtitles). Reference: https://en.wikipedia.org/wiki/Presentation_timestamp#cite_note-teknotes-1        “Group of Pictures (GOP)” has an intended meaning of a group of pictures, or GOP structure in video coding, (ref https://en.wikipedia.org/wiki/Data_compression#Video) and specifies the order in which intra- and inter-frames are arranged. GOP is a group of successive pictures within a coded video stream. Each coded video stream consists of successive GOPs. Visible frames are generated from the pictures contained in GOP;        “Packetized Elementary Stream (PES)” is intended to mean a specification in the MPEG-2 Part 1 (Systems) (ISO/IEC 13818-1) and ITU-T H.222.0 that defines carrying elementary streams (usually the output of an audio or video encoder) in packets within MPEG program stream and MPEG TS. The elementary stream is packetized by encapsulating sequential data bytes from the elementary stream inside PES packet headers.        “Real-time Transport Protocol (RTP)” is intended to mean a standardized packet format for delivering audio and video over IP networks. RTP is used extensively in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications, television services and web-based push-to-talk features. RTP is used in conjunction with the RTP Control Protocol (RTCP). While RTP carries media streams, RTCP is used to monitor transmission statistics and quality of service (QoS) and aids synchronization of multiple streams. RTP is originated and received on even port numbers and the associated RTCP communication uses the next higher odd port number. RTP was developed by the Audio-Video Transport Working Group of the Internet Engineering Task Force (IETF) and first published in 1996 as RFC 1889, superseded by RFC 3550 in 2003;        “User Datagram Protocol (UDP)” is intended to mean one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an IP network without requiring prior communications to set up special transmission channels or data paths. UDP uses a simple transmission model without implicit handshaking dialogues for providing reliability, ordering, or data integrity. Thus, UDP provides an unreliable service and datagrams may arrive out of order, appear duplicated, or go missing without notice. UDP assumes that error checking and correction is either not necessary or performed in the application, avoiding the overhead of such processing at the network interface level.        “Forward Error Correction (FEC)” is intended to mean a technique to recover partial or full, packet information based on calculation made on the information. FEC may be effected by means of XOR between packets or another mathematical computation;        “Pro-MPEG” is intended to mean Professional-MPEG Forum—an association of broadcasters, program makers, equipment manufacturers, and component suppliers with interests in realizing the interoperability of professional television equipment, according to the implementation requirements of broadcasters and other end-users;        “SMPTE 2022” is intended to mean an FEC standard for video transport, initially developed by Pro-MPEG Forum and added to by the Video Services Forum, and describes both a FEC scheme and a way to transport constant bit rate video over IP networks.        
Media streaming over switching IP networks such as fiber, leased line, CDN, public IP, wireless data networks, VSAT, and cellular networks is a challenging technical problem. A media stream may be impacted by a number of network aberrations (ex: packet loss, jitter, disorder, and capacity changes, inter alia) that make it difficult to sustain a constant stream from sender to receiver.
Reference is currently made to FIG. 1, which is a prior art block diagram of a media encoder 15 (also referred to as a “encoder”, “media sending device” or a “sender” hereinbelow and in the claims which follow) connected with a media receiver 20 (i.e., mobile devices, smart TVs, inter alia) over an IP network 25 (i.e., public IP, unmanaged networks, fiber networks, MPLS, inter alia). The network and/or media receiver may experience different network impairments and network capacities. For example, a cellular network may be more prone to capacity problems while a wireless network is more prone to packet loss.
There are two main approaches known in the art which address the problem of media streaming over switching IP networks, as described hereinbelow.    1. Well managed networks, have UDP/RTP and redundant protection information in the form of forward error correction (FEC), which is sent with the media stream and consumes 30-50% extra bandwidth in one direction. This solution has a low time delay; however it may not tolerate high packet loss nor network capacity drop-off.    2. For small scale operation, streaming with retransmission protection, also called Automatic Repeat-reQuest (ARQ) may be used. However ARQ is not useful for large-scale operations. ARQ has modest time delays, it may tolerate high packet loss, but it cannot tolerate network capacity drop.
The two main approaches listed above are addressed hereinbelow:
UDP/RTP
Media streaming with UDP/RTP is not suited for mobile or mass distribution application as these larger-scale networks are not considered “managed”.
ARQ
Another solution, ARQ, is currently offered by several vendors to address 100% recovery of lost packets. ARQ has been found to offer superior performance at lower overhead compared with existing packet loss recovery solutions.
Prior art ARQ systems work with a sender sending/transmitting UDP/RTP packets in a stream over an unmanaged IP-based packet network to several receivers. Packet loss detected by a receiver is reported to the sender using special RTCP messages. Each message may contain one or more different requests. ARQ packet processing is effective when network capacity is larger than that of the initial media stream bandwidth. As noted previously, the ARQ process allows for packet recovery with retransmission of lost packets. However if the network capacity (i.e. maximum bandwidth available for the network) drops below that of the media stream bandwidth, the ARQ method (i.e. of providing a recovery by retransmitting lost packets) cannot effectively recover lost packets.
Reference is currently made to FIG. 2, which is a prior art flow and block diagram showing an exemplary video stream 50 from a sender 52 to an ARQ receiver 65 and a loss of several packets 55 (indicated as D2, D6, D8, and D9) and subsequent respective request packets 60 (indicated as R2, R6, R8-9.) In general, a receiver requests resending packets several times during a time window in which a packet is in a receiver buffer (not show in figure). In the figure, sender 52 processes the receiver's request packets (R2, R6, R8-9) and sends respective recovery packets 62 (D3, D5, D10) back to the receiver on the main content stream (indicated by the arrows connecting the sender with the receiver).
A major shortcoming of such an ARQ system is that sometimes the IP link (i.e. the bandwidth between the sender and the receiver) may reach its capacity limit due to either a physical connection (ex: ADSL/VDSL) or by a capacity limit provided by the service provider (ex: a mobile network provider). As shown in FIG. 2, ARQ systems can send a burst of recovery packets in response to a burst of packet loss requests. The burst of recovery packets may block or interfere with the stream's packet flow, causing additional lost packets.
Some ARQ systems limit the link by employing traffic shaping, as known in the art. Traffic shaping can act to impact both the stream and the recovery packets by limiting bandwidth, effectively not addressing situations where recovery packets may block the media stream.
Workflow—Prior Art
An input video feed is encoded by an encoder, which is encodes a video stream (also known as a media stream). Encoder output is converted to an IP stream for transport over an IP network. The IP stream may be protected with either a FEC scheme or an ARQ solution. Both FEC and ARQ serve to recover lost packets, with the assumption that network conditions allow sufficient bandwidth for both the IP stream and the recovery data. If available network bandwidth is smaller than the IP stream, the recovery data will not be able to pass to recover the lost media packets, and the IP stream won't be received properly at the receiving side (i.e. media receiver).
There is therefore a need to have a media streaming system that can operate over challenging network impairments, and which can provide the highest media bandwidth and shortest time delay to the receiver.