A communication network may for example be a packet-based network and/or an internet. A network typically includes different types of network nodes, such as user devices, routers, network address translators (NATs), proxy servers, media relay servers etc., which perform different functions within the network. For instance, routers route packets between individual networks of an internet. NATs also perform such routing, as well as performing network address translation i.e. to mask the network address of the sender. Communication between two communicating nodes, such as user devices, may be via other nodes of the network, i.e. intermediate nodes such as routers, NATs and media relay servers. Every active network interface (e.g. of a user device, server etc.) connected to the network is assigned a network address, e.g. IP (Internet Protocol) address, so that is data can be routed thereto via the network. This may for example be assigned by an ISP (Internet Service Provider) in the case of a public network, or other network administrator.
A media session may be established between two endpoints, such as user devices, connected via a communication network so that real-time media can be transmitted and received between those endpoints via the network. The endpoints run client software to enable the media session to be established. The media session may be a Voice or Video over IP (VoIP) session, in which audio and/or video data of a call is transmitted and received between the endpoints in the VoIP session as media streams. Endpoints and other types of network node may be identified by a network address, such as a transport address. A transport address is formed of an IP address and a port number identifying a port associated with the IP address. A media session being may be established between transport addresses associated with the endpoints.
An example of a media session is a SIP (“Session Initiation Protocol”) media session. SIP signalling, e.g. to establish or terminate a call or other communication event, may be via one or more SIP (proxy) server(s). To this end, the SIP proxy forwards SIP requests (e.g. “INVITE”, “ACK”, “BYE”) and SIP responses (e.g. “100 TRYING”, “180 RINGING”, “200 OK”) between endpoints. In contrast to a media relay server, the media (e.g. audio/video) data itself does not flow via a basic SIP proxy i.e. the proxy handles only signalling, though it may in some cases be possible to combine proxy and media relay functionality in some cases. To establish the media session, one of the endpoints may transmit a media session request to the other endpoint. Herein, an endpoint that initiates a request for a media session (e.g. audio/video communications) is called an “initiating endpoint” or equivalently a “caller endpoint”. An endpoint that receives and processes the communication request from the caller is called a “responding endpoint” or “callee endpoint”. Each endpoint may have multiple associated transport addresses e.g. a local transport address, a transport address on the public side of a NAT, a transport address allocated on a relay server etc. During media session establishment, for each endpoint, a respective address may be selected for that endpoint to use to transmit and receive data in the media session. For example, the addresses may be selected in accordance with the ICE (“Interactive Connectivity Establishment”) protocol. Once the media session is established, media can flow between those selected addresses of the different endpoints.
A known type of media relay server is a TURN (Traversal Using Relays around NAT) server, e.g. a TURN/STUN (Session Traversal Utilities for NAT) incorporating both TURN and STUN functionality. The network may have a layered architecture, whereby different logical layers provide different types of node-to-node communication services. Each layer is served by the layer immediately below that layer (other than the lowest layer) and provides services to the layer immediately above that layer (other than the highest layer). A media relay server is distinguished from lower-layer components such as routers and NATS in that it operates at the highest layer (application layer) of the network layers. The application layer provides process-to-process connectivity. For example, the TURN protocol may be implemented at the application layer to handle (e.g. generate, receive and/or process) TURN messages, each formed of a TURN header and a TURN payload containing e.g. media data for outputting to a user. The TURN messages are passed down to a transport layer below the network layer. At the transport layer, one or more transport layer protocols such as UDP (User Datagram Protocol), TCP (Transmission Control Protocol) are implemented to packetize a set of received TURN message(s) into one or more transport layer packets, each having a separate transport layer (e.g. TCP/UDP) header that is attached at the transport layer. The transport layer provides host-to-host (end-to-end) connectivity. Transport layer packets are, in turn are passed to an internet layer (network layer) below the transport layer. At the internet layer, an internet layer protocol such as IP is implemented to further packetize a set of received transport layer packet(s) into one or more internet layer (e.g. IP) packets, each having a separate network layer (e.g. IP) header that is attached at the internet layer. The internet layer provides packet routing between adjacent networks. Internet layer packets are, in turn, passed down to the lowest layer (link layer) for framing and transmission via the network. In the reverse direction, data received from the network is passed up to the IP layer, at which network layer (e.g. IP) headers are removed and the remaining network layer payload data, which constitutes one or more transport layer packets including transport layer header(s), is passed up to the transport layer. At the transport layer, transport layer (e.g. UDP/TCP) headers are removed, and the remaining payload data, which constitutes one or more TURN messages in this example, is passed up to the application layer for final processing, e.g. to output any media data contained in them to a user, or for the purposes of relaying the TURN message(s) onwards. This type of message flow is implemented at both endpoints and TURN servers i.e. endpoints and TURN servers operates at the application layer in this manner.
An IP address uniquely identifies a network interface of a network node within a network, e.g. within a public network such as the Internet or within a private network. There may be multiple application layer processes running in that node, and a transport address (IP address+port number) uniquely identifies an application layer process running on that node. That is, each process is assigned its own unique port. The port (or equivalently “socket”) is a software entity to which messages for that process can be written so that they become available to that process. An IP address is used for routing at the internet layer by internet layer protocols (e.g. IP) and constitutes an internet layer network address that is included in the headers of internet layer packets, whereas the port number is used at the transport layer by transport layer protocols e.g. TCP/UDP to ensure that received data is passed to the correct application layer process. A transport layer packet includes a port number in the header, which identifies the process for which that packet is destined.
In contrast to media relay servers, routers typically only operate at the internet layer, routing IP packets based on IP addresses in IP packet headers. Notionally, NATs also only operate at the network layer and are distinguished from basic routers in that NATs modify IP headers during routing to mask the IP address of the source. However, increasingly NATs perform modifications at the transport layer, i.e. to transport layer packet headers, so at to also mask the source port number e.g. to provide one-to-many network address translation.
In the context of ICE, transport addresses available to an endpoint—e.g. its host address, a public address mapped to the host address at a NAT, and a transport address of TURN server that can receive media data from the other endpoint on behalf of that endpoint and relay it to that endpoint—are referred to as that endpoints candidates. They are determined by that endpoint and communicated to the other endpoint in a candidate gathering phase. Each endpoint then determines a set of “candidate pairs”, i.e. a set of possible pairings of the endpoint own addresses with the other endpoint's addresses. Connectivity checks are then performed for each candidate pair to determine whether or not that candidate pair is valid, i.e. to determine whether probe data sent from an endpoint's own address in that pair to the other address in that pair is successfully received by the other endpoint. A media session is then established between the endpoints using a selected candidate pair that was determined to be valid in the connectivity checks. Media data of the media session is transmitted from each of the endpoints to the network address of the other endpoint in the selected candidate pair. The progress of the connectivity checks and status of the candidate pairs is tracked by respective ICE state machines implemented at the endpoints.
That is, each endpoint may have multiple associated transport addresses e.g. a local transport address, a transport address on the public side of a NAT, a transport address allocated on a relay server etc. During media session establishment, for each endpoint, a respective address is selected for that endpoint to use to transmit and receive data in the media session. For example, the addresses may be selected in accordance with the ICE (“Interactive Connectivity Establishment”) protocol. Once the media session is established, media can flow between those selected addresses of the different endpoints. To select a path, a list of candidate pairs is generated, each of which comprises a network address available to a first of the endpoint—“local” candidates from the perspective of the first endpoint, though note that “local” in this context is not restricted to host addresses on its local interface, and can also include reflexive addresses on the public side of the NAT, or a relay network address of a media relay server that can relay media data to the first endpoint (relayed network address)—and a network address available to the second endpoint (“remote” candidates from the perspective of the first endpoint). Every possible pairing of local and remote candidates may be checked to determine whether or not it is valid, by sending one or more probe messages from the local address to the remote address during the connectivity checks.