This invention relates to the operation of content distribution networks.
The distribution of video and other streamed content to users on demand using the internet as a distribution medium has become a major industry. Efficient distribution systems are becoming necessary in order to manage the sheer volume of data to be carried over the network.
One measure that content providers can take is to make the content available from a number of separate content caches instead of a single central server. The individual caches are periodically updated from a master database or server, and may or may not store the same data, (e.g. there is the capability to tailor the content to local conditions, for example different linguistic preferences of the populations of the areas served by each cache). An individual user requiring a data download is directed to one of the caches from which the data can be downloaded.
However, such a system is only efficient if each user request can be directed to the most appropriate cache, normally the closest available one. In existing systems this would require a translation to be made between the network address of the end user (in the initial request message) and the address of the local cache.
In a typical system, the user would request the URL (universal resource location) code of the content provider, e.g. www.contentprovider.co.qq (where qq is a country code). This URL is processed by the domain name system (DNS) server co-operating with the user's browser system to identify a network-compatible internet address, typically a 32 or 128 bit number (e.g. IP address) which identify the network location of the target computer. The domain name system is a set of hierarchical servers. Each DNS server stores a subset of all the correspondences between URLs and IP addresses. If a DNS server does not have a record for a particular URL, the required information is sought from the authoritative DNS server for the domain associated with that URL, which either returns the required IP address or itself refers the request to another DNS server in its hierarchy, and so on. In most arrangements, the required correspondence, once retrieved from a higher level server, is then recorded in all the lower level DNS servers requesting it, thereby allowing more efficient retrieval on subsequent requests.
The URLs for download streaming services and other sites that may be cached are widely advertised and therefore have to be valid for a wide area, generally worldwide. The structure of the internet, and the way in which DNS servers operate, generally require a common IP address to be generated in response to a given URL. If a number of separate content caches are to be provided and accessed efficiently, a way needs to be found to allow the user to be given access to the appropriate one. This could be done by allowing the content provider to redirect users to the appropriate cache, but this would require the content provider to be able to identify where in the network topology the user is located. This is not readily apparent from the user identity (IP address)—the situation may also change dynamically depending on mobility of the user or changes in the configuration of the network. In any case, efficient routing of traffic takes place in the network, and primarily benefits the network operator. It is not necessarily within the content provider's area of expertise. For these reasons a network-based solution would be desirable.
One existing method of directing internet traffic is to implement a routing protocol such as the border gateway protocol (BGP) to maintain a routing policy which maintains a table of IP network addresses, whereby data is routed to the “nearest” or “best” of these destinations as viewed by the routing topology. A system known as “anycast” (by analogy with unicast, broadcast, and multicast) is also sometimes used to enable geographically distributed nodes to share a single IP address. Like broadcast (one-to-all) and multicast (one-to-many), in “anycast” each destination address identifies a set of some or (in broadcast) all receivers in the network as endpoints, but unlike either of these other systems only one of the set of endpoints (the “nearest” or “best”) is selected to receive a transmission at any one time. This arrangement requires the routing protocol to maintain the routing list to determine which of the set of receivers is currently the “nearest” or “best” for each user or access point. Changes in network topology, mobility of network users, or other factors require such frequent changes to such a routing list.
The architecture of a modern telecommunications network consists of a number of relatively self-contained subdomains, usually serving different geographical areas although these need not be defined rigidly. A typical example discussed in European patent application EP1331788 provides each subdomain with its own unique IP address. It is known for each such subdomain to have more than one content distribution server associated with it, each such server in a given subdomain having identical content. In such arrangements a load-balancing function is used to determine which of the servers is to service each request for content. Thus the load balancer can provide access using the same network address to any of the servers associated with a given subdomain using the same network address. Examples of such load balancers are discussed in United States Patent application US2006/0133371 and International Patent application WO06/072114.
The subdomains are interconnected both to allow communication between them and to provide robustness, so that in the event of technical difficulties in the equipment serving one subdomain, it can access processing power from and/or obtain connectivity through a neighbouring one. The present invention makes use of this architecture to provide improved access to a content distribution system. Each subdomain can be associated with a specified content cache. Note that this need not be a one-to-one correspondence, as one cache may serve more than one subdomain and one subdomain may be served by more than one cache. Because of the architecture of the network, the cache associated with the same subdomain as the requesting user will be the topologically closest, and therefore the most efficient for it to access.
According to a first aspect of the invention, there is provided a content distribution system comprising a plurality of access servers arranged to route traffic to and from end users on a network, each of said access servers having means for transmitting data packets to an associated content server or group of content servers, wherein all of the content servers in said content distribution system are accessible using the same internet protocol address, and each access server is configured to route any data packets it receives, if addressed to the said internet protocol address, to its respective content server. or group of content servers
According to a second aspect, the invention provides a method of operating a content distribution system arranged to route traffic between end users and a plurality of content servers all accessible using the same internet protocol (IP) address, by way of a plurality of access servers, wherein each access server is configured to identify the internet protocol address common to the content servers as being associated with a respective content server or group of content servers, and wherein on receipt of a data packet from one of said end users addressed to the said internet protocol address, the access server routes that request to its associated content server or group of content servers.
This architecture differs from the prior art systems in that a common IP address is used not only for all the content servers within a given subdomain, as is known for load balancing, all the separate subdomains also share a network address. This allows the same IP address to be used for access to the service, regardless of which subdomain the user makes contact with. This has advantages both in publicising the network address (for example through a hyperlink in a related webpage) and for consistency of access for mobile users who can use the same network address to access the local content server wherever they may connect to the network.
In a preferred embodiment, the content server is a provider of content for downloading, or streaming, to the end user in response to a data packet or packets incorporating a request for that content. Where a group of content servers are associated with one access server, access is preferably by means of a load balancer or other means for distributing data requests among the group.
Because each access server is associated with a particular content server or group of such servers, each access server can be configured in the same way to route data packets to its associated content server even though there are multiple content servers with the same IP address. It will recognise the common IP address as relating to its associated content server and route content requests accordingly. All the content servers have the same IP address, but this is not a problem as each access server needs only to be able to reach one content server, although an access server may be able to reach more than one content server in which case it will determine which one to route content requests to based on the routing list that it maintains. The user will get the same content whichever access server it makes contact with, because all the content servers have access to the same content (either cached locally or via another content server).
The duplication of IP addresses is possible because each access server can determine, based on the routing list maintained by the access server, to which content server requests should be routed.
In a preferred embodiment, the network is divided into a plurality of network subdomains, each of which is associated with a corresponding access server, and its associated content server. Delivery of the content required by the user is therefore confined to one subdomain. Note however that one content server may serve more than one network division, which may itself be accessed by more, than one access server. Each access server will only ever identify the IP address with the content server with which it is associated. Preferably each access server directs data packets to the content server having the nearest IP-routeable point to the source of the request for content, which is topologically the closest to it and its clients.
To provide robustness to the system the system is preferably configured such that in the event of a content delivery criterion failure condition being detected, all traffic passing through one access server can be redirected to another content server by way of another access server in said content distribution system. This is preferably effected by the provision of routing apparatus arranged to implement said redirection when said failure condition is detected.
Preferably this is achieved by rerouting (e.g. by means of a “tunnel”) the request from the initial access server to a second access server elsewhere, the second access server being associated with a different content server. As the content servers both have the same IP address, the second access server will recognise the IP address as relating to its own associated content server and direct the request thereto. However, as the requesting user has a unique IP address, the content server will nevertheless route the downloaded material correctly to the requesting user.