Conventional communication systems allow the user of a device, such as a personal computer or mobile device, to conduct voice or video calls over a packet-based computer network such as the Internet between two or more users. Such communication systems include voice or video over internet protocol (VoIP) systems. These systems are beneficial to the user as they are often of significantly lower cost than conventional fixed line or mobile cellular networks. This may particularly be the case for long-distance communication. To use a VoIP system, the user installs and executes client software on their device. The client software sets up the VoIP connections as well as providing other functions such as registration and user authentication. In addition to voice communication, the client may also set up connections for other communication media such as instant messaging (“IM”), SMS messaging, file transfer, screen sharing, whiteboard sessions and voicemail.
A network may have layered architecture, a notable example being the Internet. The transport layer provides host-to-host (i.e. end-to end) connectivity between network nodes as a service to processes operating at the application layer. Various protocols may be implemented at the transport layer. A transport layer protocol may be connection-oriented, e.g. TCP (Transmission Control Protocol), or connectionless, e.g. UDP (User Datagram Protocol). Connection-oriented protocols provide for the establishment of formal, end-to-end connections between hosts through an exchange of connection establishment messages, such as the well-known TCP three-way handshake (SYN, SYN+ACK, ACK). TCP is a reliable protocol, i.e. successful receipt of TCP packets is acknowledged to the sender and retransmission is attempted automatically in the event of failure, at the cost of increased latency. UDP provides no automatic retransmission or acknowledgments mechanisms, making it unreliable though less prone to latency.
A VoIP call has two distinct stages: signalling and media flow. During the initial signalling stage, a calling endpoint sends a call invite (such as an SIP INVITE) to a callee endpoint(s). Among other things, this causes the callee device to enter a ringing state, in which information, such as an audible ringing, is outputted to a user of the callee device (the callee) to inform them of the incoming call. Assuming the callee is willing to accept the call, call acceptance is signalled from the callee device to the caller device. Media parameters are negotiated during the signalling stage, to enable media such as call audio and/or video to flow between the devices in the media flow stage. The signalling is generally controlled at the application layer, for example by SIP (Session Initiation Protocol) software stacks running on the devices. Different transport layer protocols can be used for signalling and media flow: for example, SIP may operate over TCP in the signalling stage whereas media may be transmitted over UDP in the media flow stage, as often latency is more of a concern than datagram loss at this stage.
In the context of a VoIP call, end-to-end does not necessarily refer to the path between the caller and the final destination. Where signalling is performed via route involving one or more intermediate nodes of the network operating as transport layer entities—such as proxy servers, peer-to-peer (P2P) nodes, bridges (e.g. PSTN bridges), or some NATs (Network Address Translators)—separate end-to-end connections are established between each pair of nodes (including the endpoints) along the route to provide an overall path to the final callee destination.