Field of Art
The present invention relates generally to an overlay network architecture for delivering digital content among nodes of an underlying network, and more particularly to a virtual broadcast system that optimizes the routing of digital content among nodes along overlay networks that are dynamically reconfigured based upon forecasts of frequently-changing levels of congestion at component interconnections within the underlying network.
Description of Related Art
Network Congestion
As wired and wireless network traffic continues to expand exponentially, finite capacity of the shared links or interconnections among components within a network is becoming an increasingly more relevant and troubling problem. Moreover, because the level of congestion at these shared links is dynamic and subject to a great deal of volatility as network traffic ebbs and flows, such congestion is difficult to measure at any given time, and particularly difficult to predict even on a near-term basis.
This problem is somewhat analogous to that of traffic congestion at the intersecting junctions of shared roads and freeways in increasingly populated areas. While existing GPS navigation and traffic control systems measure current congestion at these junctions, and calculate optimal paths to reroute individual drivers around such congestion, their ability to predict an optimal route in advance for any particular driver is hampered by the volatile nature of traffic congestion.
When a single company such as Netflix accounts for over one-third of peak Internet traffic, companies that deliver digital information over the Internet concurrently (particularly large amounts of linear data) must somehow address the increasingly volatile nature of Internet congestion. Similarly, as mobile voice and data usage soars, the limited availability of regulated RF spectrum is of particular concern to companies developing high-bandwidth mobile applications.
While a specific application of the present invention is described herein in the context of delivering streaming video over the Internet to large numbers of concurrent users, the principles of the present invention apply equally in numerous other contexts where limited capacity of shared links among network components constrains the routing of any type of information that can be converted into a digital format (e.g., audio, images, 3D models, etc.). Other potential applications of the present invention include, for example, VoIP, corporate videoconferencing, virtual reality, multi-player gaming, and a variety of other bandwidth-intensive applications (relative to the level of congestion of shared links within an underlying network at any given point in time).
As will be discussed in greater detail below, the present invention does not “cure” the problem of limited capacity or “network congestion” at component links within an underlying network such as the Internet, but instead makes efficient use of that limited capacity by monitoring and analyzing network traffic across those links to optimize the routing of digital content among nodes of overlay networks that are dynamically reconfigured based on forecasts of congestion levels at those links.
Video Streaming Events
Since the advent of the Internet and IP-based routing, many approaches to streaming video over the Internet have emerged. Before discussing their relative advantages and disadvantages, it is helpful to step back and consider the problem being addressed. To distribute video content over the Internet, it must first be captured and digitized. We can characterize the video content as an “event” that is captured “live” (or generated digitally) and distributed over the Internet. References herein to video events include the capture or generation of both video and audio, as well as any associated metadata.
Video events can either be scheduled or unscheduled. For example, the “Super Bowl” is a scheduled event in that the time of its occurrence is known in advance, whereas other events (e.g., natural disasters, a toddler's first steps, or even video on demand—“VOD”) are unscheduled in that they may occur with little or no advance warning.
Video content may be captured in its entirety to generate a digitized video file before it is distributed over the Internet as any other type of file is transferred (e.g., via an “FTP” or file transfer protocol). However, such a “file transfer” approach imposes a delay on the recipient's viewing (playing) of the video content—i.e., the recipient must wait until the entire file has been transferred before viewing the video content. Given the relatively large file sizes of digitized video, this delay can be significant.
Video content is therefore often “streamed” to users so they can continuously receive and view the content while it is still being sent. In essence, the video content is divided into an ordered linear stream of small files or “video segments” (e.g., 1 to 10 seconds in length) that are delivered to users who can start viewing them as they are received. To view a continuous stream of video content without delay or jitter, each video segment must be played at regular intervals—e.g., 30 frames per second (fps). Note, however, that video segments need not be received at regular intervals, provided that each segment is received before the playback of the prior segment has concluded.
Whether an event is scheduled or unscheduled, it can be streamed “live” (i.e., as the event occurs) or “pre-recorded” for streaming any time after the occurrence of the event. For example, the Super Bowl could be captured and streamed live as the event occurs, or pre-recorded for streaming at a later time.
Finally, whether an event is scheduled or unscheduled, and whether it is pre-recorded or streamed live as it occurs, it can be streamed in “real time” (i.e., with a largely imperceptible delay from sender to receiver) or “delayed” in transit for seconds or even minutes. For example, viewers of a television program (e.g., a baseball game) that is streamed over the Internet, but not in real time, might experience the streamed event at different times from one another, or at different times from viewers watching the same program broadcast via cable or satellite. Such delays (particularly if more than a few seconds) may diminish a user's “quality of experience” (QoE)—i.e., a user-centric or application-level view of quality, as contrasted with a “quality of service” (QoS), which is a measure of performance based on network-centric metrics (e.g., packet delay, packet loss, or jitter caused by routers or other network resources).
For example, social interaction among viewers may be constrained (wholly apart from jitter or other video artifacts) due to the fact that viewers experience the same event at different times. This is particularly problematic today when so many events (scheduled or unscheduled) are communicated in real time in so many different ways—from broadcast radio or television to social media and other Internet services, accessible via mobile phones and desktop and laptop computers, as well as via a constantly evolving domain of consumer electronic devices.
It is therefore desirable for a video streaming system to handle unscheduled as well as scheduled events, to stream live as well as pre-recorded events, and to stream those events in real time with minimal delay in order to provide viewers with a consistent QoE. Moreover, as the number of concurrent viewers of a streaming video event increases, maintaining a consistent QoE becomes a formidable problem. For that reason, scalability is a key design goal of any such system.
Despite recent advancements in video streaming technology, the historical “ad hoc” evolution of the infrastructure of the Internet still presents significant obstacles to Internet-based video streaming, not the least of which is an inconsistent QoS, which leads to network congestion at times and locations across the Internet that are difficult to predict. While a key objective of the present invention is to maintain a consistent QoE for viewers of streaming video events, this objective is constrained by network congestion across the Internet which ultimately cannot be eliminated.
Underlying Internet Architecture
Beginning with ARPANET (the earliest packet-switching network to implement the Internet protocol suite, or TCP/IP), and later NSFNET, the Internet “backbone” was designed to be a redundant “network of networks” (i.e., the Internet) that afforded reliability or “resiliency” by decentralizing control and providing alternative communication (routing) paths for information to reach its desired destination. Yet, with packets following different paths among routers and other shared network resources, maintaining a consistent QoS or QoE over the Internet remains an extremely difficult problem.
As the Internet backbone evolved and was privatized, redundancy and overlap developed between traditional backbone networks and those owned by long-distance telephone carriers. For the purposes of this specification, we distinguish large “public” networks that provide data to customers directly, or via other smaller “internet service provider” (ISP) networks, from large “private” backbone networks that carry data only among themselves, or serve as a conduit between large ISPs, but do not directly provide data to customers. In either case, these large public and private networks are typically implemented as “fiber rings” interconnected via fiber-optic trunk lines—i.e., multiple fiber-optic cables bundled together to increase network capacity.
For routing purposes, the largest network providers that carry the heaviest network traffic (e.g., large ISPs and private backbone networks) are assigned blocks of IP routing prefixes known as “autonomous systems” (AS), each of which is assigned an “autonomous system number” (ASN). We refer to each of the large fiber rings owned by these companies as an ASN. The number of ASNs has grown dramatically in recent years, from approximately 5000 ASNs fifteen years ago to over 50,000 ASNs across the world today. As alluded to above, many large network providers also own backbone fiber-ring networks (i.e., private ASNs) that do not service customers, but may be connected to their own “public ASNs” or those owned by others.
Because different companies own ASNs, they enter into agreements with one another to facilitate the routing of Internet traffic across ASNs and throughout the global Internet. Each ASN utilizes a bank of routers often referred to as a “peering point” to control access to another ASN, employing a routing protocol known as the “border gateway protocol” or BGP. Any given ASN may employ multiple peering points to connect to multiple different ASNs. Interconnected ASNs may be geographically adjacent, or may be far apart, connected via long fiber trunks spanning great distances (e.g., across countries or even oceans). Public ASNs may also be interconnected via “private ASNs” or backbone networks.
Monitoring QoS within and across ASNs is extremely difficult. Large network providers maintain much of the routing and performance information within their ASNs (including dynamic congestion metrics) as proprietary. While the “Open Message Format” (of the current BPG Version 4) provides for a “data dump” of certain information when a TCP/IP connection to a BGP router is established, this mechanism is not terribly useful as a practical matter. Many BGP routers do not support the Open Message Format, while others simply turn it off. Moreover, the information is typically 5 minutes out of date, which is a relatively long time given how frequently congestion levels change across the Internet.
Because such a large amount of Internet traffic flows across the relatively high-bandwidth peering points interconnecting ASNs, these peering points are often key “bottlenecks” or sources of much of the congestion across the Internet at any given time, apart from the “last mile” problem within an ASN (i.e., congestion across the relatively lower-bandwidth wired and wireless connections between end users and their “gateway” ISPs).
For example, as the traffic load across an ASN peering point increases, the routers in the ASNs on each side of the peering point become congested. In other words, these routers experience high utilization rates of RAM, CPU and other limited-capacity shared resources. Increased demand on these resources reduces performance (e.g., bit rates) across these peering points, and eventually may lead to lost data packets. Because network traffic across the Internet is not centrally controlled, it is difficult to predict the frequently changing levels of “peering point congestion” across the Internet at any given time.
If one cannot guarantee a consistent QoS within and across ASNs, it becomes very difficult to maintain a consistent QoE for viewers of streaming video events. Any system that streams video over the Internet is subject to unreliability and constantly changing levels of congestion of shared routers, particularly at ASN peering points through which so much Internet traffic flows. This problem is exacerbated when streaming video to large numbers of concurrent viewers across the Internet, and in particular across these ASN peering points.
Existing Video Streaming Approaches
Various approaches to streaming video over the Internet have evolved over the past few decades, with a vast array of terminology employed to characterize and distinguish different techniques for generating overlay network topologies (on top of the Internet) and delivering video content among network nodes along these overlay networks. In comparing different approaches, it is helpful to return briefly to the GPS navigation analogy, and consider the factors which affect the time required to travel between any two points or nodes—i.e., distance, speed and congestion (typically addressed by rerouting along a different path).
In the context of routing packets on the Internet, distance (or geographic proximity) is not of direct relevance because packets travel near the speed of light. Speed, however, is affected by the number of stops or roadblocks encountered along a route, or in this context the number of “hops” encountered at intermediate routers between two nodes. Thus, two nodes can be said to be “nearby” each other (in “network proximity”) if they are only a relatively few hops apart, regardless of their geographic proximity. Congestion at intermediate nodes along the path between two nodes affects the overall travel time, and can be addressed by dynamically rerouting traffic—i.e., dynamically reconfiguring the overlay networks that determine the path between two nodes. As will be discussed below, these factors serve to illustrate key distinctions among different approaches to streaming video over the Internet.
The most common method of delivering video outside of the Internet is to “broadcast” a video stream (e.g., a television program) from a “point of origin” to all destination viewers simultaneously—e.g., via dedicated cable or satellite infrastructure. While network hubs can be employed in a LAN to broadcast information to all network nodes, broadcasting packets of video segments across switches and routers over the Internet would be wildly impractical and inefficient. Most network users would not be interested in viewing any given “channel” of video content, and significant congestion would occur near the point of origin as routers broadcasting video segments to other routers would be quickly overwhelmed. A broadcast solution is simply not feasible for delivering a channel of video content over the Internet from a single point of origin to a large number of concurrent viewers who can join the channel at any time.
An alternative “multicast” approach involves simultaneously streaming each video segment from a point of origin to predefined groups of nodes across the Internet. This approach is similarly impractical for large-scale video distribution across the Internet. Moreover, specialized infrastructure is required, such as dedicated routers with multicasting functionality, which is also impractical and prohibitively expensive for large-scale commercial use.
By contrast to broadcast and multicast techniques, a “unicast” approach to video streaming involves sending video segments from a point of origin to a single destination node (e.g., by establishing a TCP/IP connection with a defined destination node IP address). But delivering a large number of unicast packets simultaneously to each viewing node would also quickly overwhelm routers at or near the point of origin, and would fail to achieve a consistent QoS for many of the reasons noted above, not to mention the enormous cost of providing sufficient bandwidth to handle such a large number of simultaneous transmissions.
Some VOD companies (such as Netflix and YouTube) have employed variations of this unicast approach that generally rely on expensive “edge-server” infrastructure. This approach (sometimes referred to as a “content delivery network” or CDN) involves deploying many physical servers across the Internet, and distributing copies of each channel of video content to each server. As a result, viewing nodes can receive desired video content from a nearby server (in network proximity—only a relatively few hops away from a viewing node).
Each edge server typically has significant bandwidth and computational capabilities, and essentially constitutes a separate video content source from which nearby viewing nodes can obtain any channel of video content at any point in time (“on demand”). This approach of adding physical infrastructure is somewhat akin to building additional freeways and off-ramps to enable a greater number of people to reach popular destinations more quickly (with fewer turns and less time spent on slower roads).
While different users typically want to watch different video channels at any given time, VOD systems occasionally face “peak” demand periods during which a particular video event must be streamed to a large number of concurrent viewers (e.g., a final episode of a popular television series), which can overwhelm even the largest streaming video company's infrastructure—or at least result in an inefficient “worst-case” deployment of expensive infrastructure that is frequently underused (i.e., during more common periods of non-peak demand). Alternative VOD solutions have attempted to avoid the need for expensive edge-server infrastructure by replicating and distributing video content among network nodes themselves (as discussed, for example, in U.S. Pat. Pub. No. 2008/0059631).
With or without expensive edge-server infrastructure, none of these VOD solutions addresses the QoS problem for unscheduled video events, as they all rely on “pre-seeding” edge servers or viewing nodes throughout the Internet with content known in advance—to ensure a nearby source of video content. Streaming a live unscheduled event would require real-time concurrent delivery of video content to all of these edge servers or viewing nodes, a problem not addressed by any of these VOD systems.
More recently, certain unicast-based video streaming standards (e.g., “WebRTC”) have evolved to facilitate “point-to-point” streaming of video among desktop and mobile web browsers without the need for any plugins. Many existing smartphones, as well as desktop and laptop computers, include WebRTC libraries that support browser-to-browser video streaming, as well as “adaptive streaming” libraries that enable a viewing node to detect its bandwidth and CPU capacity in real time, and automatically request a lower or higher “bit rate” to adapt to changes in those metrics.
Adaptive streaming implementations include Apple's “HTTP Live Streaming” (HLS), Microsoft's “Smooth Streaming” and the “MPEG-Dash” ISO standard, among others. In a typical point-to-point video streaming scenario, a receiving node periodically requests from an HTTP server “manifest files,” which include the locations of each available bit-rate version of upcoming (e.g., the next eight) video segments. For example, each video segment might be available in 1080p, 720p and 480p versions, reflecting different “video resolutions” that require different streaming bit rates (bandwidth) to ensure each video segment is delivered in essentially the same amount of time, regardless of its resolution.
Standard HTML5 video players (in web browsers that support WebRTC) typically buffer three video segments before they start playing video content. They use the current manifest file to send an HTTP request to an HTTP server for each video segment. The sending node then “pushes” each video segment (in small “chunks”) to the receiving node in accordance with WebRTC standards for playback in the receiving node's web browser. If the receiving node supports adaptive streaming implementations, and determines that the time required to receive recent video segments is increasing or decreasing significantly, it automatically begins requesting lower or higher bit-rate video segments from among the choices in the manifest file. In other words, it “adapts” to its actual bandwidth over time by varying the resolution of the video segments it requests.
The “resolution” of a frame of video is a measure of its width×height (e.g., 1920×1080 or 1080p) or number of pixels in a frame, while its “bit rate” refers to the number of “bits per second” (bps) that are transmitted from a sender to a receiver. For example, if 30 frames of 1080p-resolution video are delivered every second (30 “frames per second” or fps), and each color pixel contains 24 bits (24 “bits per pixel” or 24 bpp), then the bit rate would be equal to almost 1.5 Tbps (1,492,992,000 bps—i.e., 1,492,992,000=(1920×1080 “pixels per frame” or ppf)×(24 bpp)×(30 fps).
Standard video codecs employ compression (e.g., MPEG2 compression) and other video encoding techniques to yield lower effective bit rates (e.g., 3 Mbps). In view of the above, “bit rate” and “resolution” are highly correlated in that one can increase or decrease the effective bit rate by providing higher or lower resolution frames of video. We therefore use these terms somewhat interchangeably herein in this regard.
WebRTC and adaptive streaming standards permit virtually any smartphone user to capture and stream live video events, and also enable such users to join a streaming video channel originating from another point of origin across the Internet—ranging from other individual smartphone users to large companies hosting an array of video content. These standards are designed, however, for point-to-point video streaming, and do not address the “video delivery” problem of streaming a video channel to large numbers of concurrent viewers across the Internet.
To address this problem, some video streaming companies (e.g., StreamRoot) have adopted an approach that typically involves a “peer-to-peer” (P2P) or mesh network topology in which video content is relayed from one viewing node to another (sometimes referred to as “peercasting”). In a video streaming context, these terms can be used interchangeably to refer to overlay networks configured on top of the Internet that enable viewing nodes to relay streaming video content to one another in a distributed fashion. Delivering streaming video to large numbers of concurrent viewers should be distinguished, however, from non-streaming uses of a P2P or mesh network topology, e.g., for file transfer or file sharing applications.
P2P video streaming systems deliver a channel of video content from a single point of origin to large numbers of concurrent users over the Internet. Such systems tend to be both resilient and scalable, in that their distributed nature facilitates recovery from individual points of failure (e.g., by rerouting content via other nodes), and their reliability and performance actually improve as more nodes are added to the network (i.e., as more and better routing “relay” options become available).
When new nodes join a video channel or existing nodes leave (stop viewing) the channel, P2P video streaming systems must, to some extent, dynamically reconfigure the topology of the overlay network—i.e., modify at least some of the routing paths among network nodes to accommodate the new nodes. For example, when a new node is added, its geographic location may be considered in an effort to select nearby existing nodes from which it will receive (and to which it will relay) video content.
But, if “peer” nodes are selected based merely on their geographic proximity, they still may be relatively “distant” from one another (and not in network proximity)—e.g., if they reside in different ASNs. As a result, traffic among them may cross one or more potentially congested peering points. For example, the actual latency between two nodes in close geographic proximity may exceed the sum of the latencies between each of those nodes and a geographically distant node. This phenomenon is sometimes referred to as a “triangle inequality violation” (TIV), which illustrates the disadvantages of relying on BGP routing for delivering digital content among nodes of an overlay network across ASN peering points.
One reason for this problem with existing P2P video streaming systems is that they are not constructed to be “compatible” with the underlying architecture of the Internet. Any overlay network topology built on top of the Internet is still subject to many points of disruption or failure (apart from new or disappearing nodes), such as the myriad of QoS problems noted above. By not addressing the Internet's underlying QoS volatility, particularly at ASN peering points, such systems face significant obstacles in providing their users with a consistent QoE.
Thus, existing P2P video streaming systems (like GPS navigation systems) rely on geographic proximity (rather than network proximity) to select peer relay nodes, and reroute traffic only “after the fact” once congestion is encountered. Moreover, real-time streaming of linear data to concurrent users imposes an additional constraint not found in GPS navigation systems—the content must arrive “simultaneously” at each node. Edge-server and other physical infrastructure approaches (akin to building freeways and off-ramps to provide higher-speed routes) are expensive and also fail to adequately address the problems of unscheduled events and high-concurrent usage of any particular event.
There is therefore a need for a digital content delivery system that addresses the deficiencies discussed above, and takes into account the underlying architecture of the Internet (particularly at ASN peering points through which so much Internet traffic flows) in generating and dynamically reconfiguring overlay networks so as to provide client nodes with a consistent QoE.