1. Technical Field
The present invention relates generally to data packet transport and routing over the Internet.
2. Brief Description of the Related Art
The public Internet is increasingly being used by enterprises for a variety of mission-critical applications such as transactions for e-commerce, inter-office connectivity over virtual private networks (VPNs), and most recently, for web services as a new paradigm for developing distributed applications. The current Border Gateway Protocol (BGP) based Internet routing infrastructure, however, is inadequate to support the reliability and performance needs of these applications. In particular, Internet routing, largely determined by the BGP protocol, has several weaknesses. First, BGP uses a metric known as shortest AS (Autonomous System) path length to determine a next hop for a packet. FIG. 1A shows that BGP will route data from Network A destined for Network D directly, because the AS path length is one. This is not always desirable, as it has been shown that BGP is slow to converge. Thus, if a link between two networks becomes unavailable, it can take seconds to several minutes before all relevant routers become aware and can route around the problem. During this time, packets will be lost. Furthermore, as illustrated in FIG. 1B, peering policies may dictate that a network should not accept packets from another network; BGP cannot efficiently route around this problem. Another problem is that different Internet applications require different characteristics (e.g., minimal loss, latency, or variability in latency) of an end-to-end connection for optimal performance. BGP makes no effort to route for quality of service and has no notion of any of these metrics. As illustrated in FIG. 2, BGP will choose a route (between Networks A and D) with larger latency than alternative routes.
There is a need in the art for intelligent routing as businesses increasingly rely on the Internet for such applications as Web transactions, virtual private networks (VPNs) and Web Services. The notion of intelligent routing based on measurements of real time network conditions is known in the art, e.g., such as the product offerings by RouteScience and other companies. These products, however, only have an ability to control only the first hop of the outbound route, namely, by injecting appropriate directives into the router. Attempts to control the inbound route, e.g., by affecting BGP advertisements, are limited by the low frequency with which these advertisements can be changed, the coarse granularity of the advertisements, the requirement of cooperation from multiple routers on the Internet, and the ubiquity of policy overrides by several ISPs.
Distributed computer systems also are well-known in the prior art. One such distributed computer system is a “content delivery network” or “CDN” that is operated and managed by a service provider. The service provider may provide the service on its own behalf, or on behalf of third parties. A “distributed system” of this type typically refers to a collection of autonomous computers linked by a network or networks, together with the software, systems, protocols and techniques designed to facilitate various services, such as content delivery or the support of outsourced site infrastructure. Typically, “content delivery” means the storage, caching, or transmission of content, streaming media and applications on behalf of content providers, including ancillary technologies used therewith including, without limitation, request routing, provisioning, data monitoring and reporting, content targeting, personalization, and business intelligence. The term “outsourced site infrastructure” means the distributed systems and associated technologies that enable an entity to operate and/or manage a third party's Web site infrastructure, in whole or in part, on the third party's behalf.
A known distributed computer system is assumed to have a set of machines distributed around the Internet. Typically, most of the machines are servers located near the edge of the Internet, i.e., at or adjacent end user access networks. A Network Operations Command Center (NOCC) may be used to administer and manage operations of the various machines in the system. Third party sites, such as Web site, offload delivery of content (e.g., HTML, embedded page objects, streaming media, software downloads, and the like) to the distributed computer system and, in particular, to “edge” servers. End users that desire such content may be directed to the distributed computer system to obtain that content more reliably and efficiently. Although not shown in detail, the distributed computer system may also include other infrastructure, such as a distributed data collection system that collects usage and other data from the edge servers, aggregates that data across a region or set of regions, and passes that data to other back-end systems to facilitate monitoring, logging, alerts, billing, management and other operational and administrative functions. Distributed network agents monitor the network as well as the server loads and provide network, traffic and load data to a DNS query handling mechanism, which is authoritative for content domains being managed by the CDN. A distributed data transport mechanism may be used to distribute control information (e.g., metadata to manage content, to facilitate load balancing, and the like) to the edge servers. As illustrated in FIG. 3, a given machine 300 comprises commodity hardware (e.g., an Intel Pentium processor) 302 running an operating system kernel (such as Linux or variant) 304 that supports one or more applications 306a-n. To facilitate content delivery services, for example, given machines typically run a set of applications, such as an HTTP Web proxy 307, a name server 308, a local monitoring process 310, a distributed data collection process 312, and the like.
Content delivery networks such as described above also may include ancillary networks or mechanisms to facilitate transport of certain data or to improve data throughput. Thus, an Internet CDN may provide transport of streaming media using information dispersal techniques whereby a given stream is sent on multiple redundant paths. One such technique is described in U.S. Pat. No. 6,751,673, titled “Streaming media subscription mechanism for a content delivery network,” assigned to Akamai Technologies, Inc. The CDN may also provide transport mechanisms to facilitate communications between a pair of hosts, e.g., two CDN servers, or a CDN edge server and a customer origin server, based on performance data that has been collected over time. A representative HTTP-based technique is described in U.S. Published Patent Application 2002/0163882, titled “Optimal route selection in a content delivery network,” also assigned to Akamai Technologies, Inc.