Multimedia over Internet protocol (MoIP) refers to voice, video, and other types of media transmitted over computer networks using the Internet protocol (IP). Media, such as voice signals, is first digitized to transform the analog input into a stream of digital data. Such data may be compressed depending on the requirements and capabilities of the transmitting and receiving systems. The data, whether compressed or uncompressed, is transmitted over an IP network by first formatting the data stream into multiple discrete packets.
Another characteristic of multimedia communications is that a signaling channel is often used to convey control requirements between endpoints and between endpoints and call control servers. The control channel is eventually used to establish one or more media channels for transferring media between two or more endpoints. Media includes, but is not limited to voice, video, white board data, and instant messaging. The signaling channel can be based on user datagram protocol (UDP) or transmission control protocol (TCP) depending on the multimedia signaling protocol that is used. For example, the H.323 protocol often uses TCP, while session initiation protocol (SIP) and media gateway control protocol (MGCP) often use UDP. Most media traffic is real-time by nature, and therefore uses UDP, or real-time transport protocol (RTP), which is based on UDP. In most cases, parameters for media communication are negotiated using the signaling protocol, and are not known in advance.
The use of multimedia over IP is rapidly spreading throughout the world, and often there is a need to set up or deploy multimedia IP endpoints behind a firewall or a network address translator (“FW/NAT devices” herein). Common scenarios include deploying IP phones in an enterprise network behind the enterprise FW/NAT device, and setting up IP endpoints behind a residential broadband gateway that serves residential broadband users. In the latter scenario, the residential gateway serves as a FW/NAT.
The problems that FW/NAT devices create for MoIP communications may be understood with reference to the function of a pinhole in a FW/NAT device. The term pinhole refers to a configuration of a FW/NAT device that logically and conceptually creates a temporary opening in the FW/NAT device through which data can be transmitted to and from an endpoint placed behind the FW/NAT device. Configuring a pinhole thus allows packets belonging to the same stream to return back to the endpoint. Practically all FW/NAT devices permit (or can be configured to permit) opening symmetric pinholes through them for limited periods of time, provided that the initiator of the pinhole is an entity logically inside the FW/NAT device. Most data protocols, such as e-mail, web, and file transfer protocol (FTP) operate in that manner.
Specifically, a client application opens a session, usually over the transmission control protocol (TCP), and establishes a temporary symmetric pinhole in the FW/NAT device. The client application then requests certain information (e.g., Web page content), receives the information in a response, and eventually closes the connection. The FW/NAT device allows the response information to enter the area protected by the FW/NAT device. The pinhole is termed symmetric because the FW/NAT device permits entry of traffic only from the responding (external) entity that received the initial request. The FW/NAT device closes the pinhole when the client or server closes the connection. However, most FW/NAT devices also will close the pinhole upon reaching a certain timeout threshold, regardless of the response or request to terminate the connection (this is particularly important for protocols based on UDP which is not even connection oriented). These devices enforce such timeouts based on a policy that maintaining pinholes open for too long represents a security risk and wastes memory and processing resources.
UDP also is commonly used for transferring data across the Internet. Network traffic using UDP is also usually permitted through the FW/NAT device as long as the network traffic is initiated by an entity protected by the FW/NAT device. UDP is commonly used by applications such as instant messaging, file sharing, and multimedia. Most FW/NAT devices permit a process behind the FW/NAT device to open a UDP session, and can be configured to do so for specific port numbers without introducing any real security threat.
The most restrictive and secure implementation is that of a symmetric FW/NAT device where only the external entity that was contacted by an internal entity can reply to the internal entity through the FW/NAT device. Some of the more important MoIP signaling protocols are based on UDP, or at least optionally have such a capability. However, a challenge with UDP is that unlike TCP, which follows a very explicit state machine, UDP provides no clear indication to intermediate devices, such as FW/NAT devices, about when a session is actually closed. Therefore, to enforce a reasonable security policy, FW/NAT devices terminate existing UDP pinholes based on rather short timeout periods. In most implementations, such timeout periods are a few minutes or less, which is significantly shorter than a typical VoIP communication period.
Thus, there is a need for a way to communicate multimedia through a FW/NAT device with a protocol based on UDP that can conveniently maintain a pinhole open for the entire duration of a multimedia session, such as a VoIP call.
Accordingly, a challenge in creating a long-term connection for MoIP is in keeping the communication flowing between endpoints or users who are separated by FW/NAT devices. FW/NAT devices generally allow entities behind them to communicate with external entities by opening port pinholes for the purpose of each specific communication stream allowing temporary access to and from the external resource. However, such pinholes are indeed for specific streams initiated from the inside, and have a very limited lifetime. The pinhole are terminated after a predefined period of time or as soon as the communication is over.
Another challenge is that external endpoints outside a FW/NAT device may need to reach multimedia endpoints, which requires the external endpoints to pass traffic through a FW/NAT device. For example, receiving incoming calls is an essential part of any VoIP service. However, an incoming call does not have the capability of creating a pinhole through the FW/NAT device to enable such a connection, as a pinhole is not yet established for such a call.
Yet another challenge is that media streams use dynamically allocated port numbers, and for which the port pinhole is not known in advance. Therefore, it is impossible to establish the required connection using the allocated port without a solution for handling endpoint-to-endpoint communication.
Therefore, it would be advantageous to have a solution that establishes and maintains a permanent communication channel to multimedia endpoints residing behind FW/NAT devices. Such a channel would allow the protected endpoints to not only initiate but also receive multimedia calls. It would be further advantageous if such a solution would be compatible with commonly deployed network components and would enable establishing the multimedia channels without needing to modify any existing equipment.