Voice over Internet Protocol (VoIP) phone calls are susceptible to man-in-the-middle attacks in which a third party assumes the identity of one of the parties to the call. This assumption may allow the third party to gather information from one or both of the calling parties. VoIP calls generally involve two users having a conversation through a data network rather than through the traditional public switched telephone network (PSTN). The term VoIP as used here includes any packet switched network, whether that network operates in accordance with the Internet Protocol or not.
The PSTN operates as a circuit-switched network in which voice signals travel through a circuit or path formed by switches at various points in the circuit. A person attempting a man-in-the-middle (MIM) attack would have to breach the circuit, such as by tapping one end or the other. Packet switched networks encode voice signals into digital data and then packetize that data and route the packets into the network. No dedicated circuit exists.
For a typical phone user, a MIM attack could capture and ‘record’ the data packets, allowing the construction of audio files. The information in these audio files would allow the attacker to gain information about the user. A MIM attack could also allow an attacker to assume the identity of the other party by intercepting the packets.
Generally, end-to-end security provides the strongest defense to these attacks. However, unless two users both reside in the same VoIP provider's network, end-to-end security will typically not exist. This type of system, within the same VoIP provider network, constitutes a ‘closed’ system. Most users will not operate in a closed system and will need a way to provide end-to-end security in an open system.
For open systems, most security methods involve encryption. Users encrypt data frames containing multimedia conversations to prevent intermediate nodes from gaining any useful information about the content of the communication. However, in order for end-to-end encryption to exist, the two parties participating in a phone call must agree on cryptographic keys to encrypt their data frames. Absent Public Key Infrastructures or pre-shared keys, such a key exchange must occur in the ‘clear,’ allowing a MIM attacker to acquire the keys from each endpoint and perform pair-wise secure setup with each endpoint. The attacker would then pass along the media information after inspecting and recording its contents.