Internet Protocol (IP)-based networks are of great importance in today's information society. A network is comprised of two fundamental parts, the nodes and the links. A node is a type of network device, such as a computer. Nodes are able to communicate with other nodes through links, like cables or radio wave transmission. There are basically two different network techniques for establishing communication between nodes on a network: the circuit-switched network and the packet-switched network techniques. The former is the traditional telephone system, while the latter is used in IP-based networks.
A circuit-switched network creates a closed circuit between two nodes in the network to establish a connection. The established connection is thus dedicated to the communication between the two nodes. One of the immediate problems with dedicated circuits is wasted capacity, since almost no transmission uses the circuit one hundred percent of the time. Also, if a circuit fails in the middle of a transmission, the entire connection must be dropped and a new one established.
IP-based networks on the other hand utilize a packet-switched network technology, which uses available capacity much more efficiently and minimizes the risk of possible problems, such as a disconnection. Messages sent over a packet-switched network are first divided into packets containing the destination address. Then, each packet is sent over the network with every intermediate node and router in the network determining where the packet goes next. A packet does not need to be routed over the same links as previous related packets. Thus, packets sent between two network devices can be transmitted over different routes in the event of a link breakdown or node malfunction.
Examples of IP-based communications include instant messaging (IM) and Voice-over-Internet Protocol (VoIP). These forms of communications make it easier for businesses and the general public to quickly and conveniently reach out and contact each other. In traditional analogue voice communication, the voice is carried between parties in a circuit switch connection in the closed telecommunications service provider's networks and identity of a calling entity is determined by recognition of the voice and in some instances by Calling Line Identification, which is a function of the underlying telecommunications network, and in some cases, through the caller entering user identification and password, such as when accessing phone banking services provided by interactive voice response (IVR).
Newer forms of IP based communication support the establishment of multiple modes of communication simultaneously. Such systems are typically referred to as multimodal systems, where two parties communicate with each other over two or more modes of communications such as text, video, and voice. For example, a user may establish a voice call from a cell phone with a merchant to order a product or service. In such an interaction, the merchant may elect to simultaneously respond with both a text menu of available products and a voice response. The user may continue to interact with the merchant via voice or text selection, or both and further parties may join the communications session. Such systems are particularly prone to new forms of attack where authentication of the both parties is required as additional modes of communication are added or removed from the interactions between multiple parties.
Since these traditional voice-based communications sessions relied upon closed networks and analogue techniques, the threat of call interception and speech manipulation was very low. However, with newer emerging digitally based communication schemes such as VoIP, the voice is carried on an open shared network in packets and call interception and speech manipulation is easier, with no guarantee that of authenticity of the calling party. This same problem arises with video telephony systems using newer digitally based schemes. Furthermore, the packet data network equivalent to Calling Line Identification cannot be assured given the possibility of e data packets being intercepted and manipulated. Increasingly, communication is occurring over open networks, such as the Internet, which are more susceptible to security attacks and digital tampering. Furthermore, communications rely on different forms of identity and authentication across different communications subsystems and technologies.
For example, many security challenges arise when the party initiating the communication, e.g., Party A, is not well known to the party receiving the communication, e.g., Party B. That is, if a credit card company wants to speak with a customer, the customer receiving the call may not know that it is really their credit card company calling them or a party impersonating their credit cart company. Party B either trusts Party A and accepts risk by continuing the call, or disconnects the call and, if motivated to do so, contacts the original Party A by phoning them (phoning the credit card company in this example).
Another example of a security threat is when Party A is well known to Party B. While Party A and Party B may recognize each other's voices, with modem IP-based communications and speech synthesis technology, it is increasingly possible for an external party to intercept a phone call and impersonate or masquerade as the intended recipient (B Party). This can be performed by intercepting and manipulating the packets to digitally alter the voice signal. In this case, Party A does not really know if he is speaking with Party B. In yet another example, an external party may impersonate Party A and initiate contact with others who accept the call and may fail to recognize the fraud.
Security challenges also arise when customers contact organizations such as bank call centers. Call centers typically ask callers to verbally provide personal information such as address and date of birth as part of the process of authenticating the caller as a customer. Clearly, divulging personal data routinely to multiple people in multiple cities and countries over the course of time dilutes the value of the personal data as a secure credential for authentication. Date of birth cannot be a secure form of identification when it is given out to multiple people and organizations on hundreds of occasions over a period of time.
Speech based threats above also apply to video based communication systems, where an attacker may alter the signal to impersonate another entity. Additionally, threats arise when the communication is not session-based. In an example of an asynchronous communication over email, phishing is a security issue where the Party B receives an email purporting to be, for example, a bank of which they are a customer, and the email encourages the recipient to take action in order for the recipient to unwittingly divulge personal information to an unknown external party.
A further threat occurs due to the multiplicity of digital communications channels available (e.g., messaging, video, voice) wherein the calling party may commence dialogue in one communications channel and transfer to another. Furthermore, one or more of the parties may roam from one wireless or fixed access network to another or one or more of the parties can switch from one device to another. The additional security problem in these situations is not knowing that an authentic transfer has been achieved and not knowing if no session “hi-jacking” has been performed by an attacker.
Another example of a threat occurs after the communication channel has been established and parties have been successfully authenticated. If an attacker intercepts and redirects the packet data forming the communications channel, the attacker can “hi-jack” the call, removing one party from the communications session and impersonating the removed party. Without ongoing authentication, the other parties in the call may not realize they are interacting with a person different to the one with which they started the voice call or Instant Messaging chat session.
A further example is where communications occur between two parties over multiple channels simultaneously such as multimodal dialogue systems. For instance, a voice call may be established between two parties as one mode of communication with subsequent modes of communication, such as text or video, established simultaneously between the two parties. These multiple modes of communication may be established over a single communications channel or multiple communication channels and access networks or with multiple devices. The problem arises to correctly authenticate the identity of the party for the initial communication channel and all subsequent sessions established over the communications channels. Furthermore, it is possible for an attacker to attempt to masquerade as one of the parties in an attempt to establish an additional mode of communication. This may be an attempt to eavesdrop or carry out some other attack. Another example is a caller using self care services on an IVR or automated chat server and subsequently invoking assisted care to have a human agent guide the caller through the self care interface. Still, a further example is one party arriving home and electing to switch the session from the mobile device to the home fixed internet device.
It is a primary object of an embodiment of the invention to provide a method and system that provides a way for parties communicating with one another over multiple channels that are established either simultaneously or sequentially, to verify that each party is actually communicating with the correct person. It is another object of an embodiment of the invention to provide a method and system that provides security of communications between employees, customers, suppliers and related entities over established communications channels. It is a further object of an embodiment of the invention to provide a method and system that provides secure communications sessions for service providers such as telephone companies, internet service providers and general users of these systems. It is yet a further object of an embodiment of the invention to provide a method and system that provides secure communications sessions for firms specializing in securing electronic transactions. It is still another object of an embodiment of the invention to provide a method and system that provides secure communications for network equipment providers who offer collaboration products to enterprise and telephone company service providers. It is still another object of an embodiment of the invention to provide a method and system that provides secure communications for users roaming across networks or switching devices. It is still another object of an embodiment of the invention to provide a method and system that provides secure communications between systems provided by different service providers, such as between IM sessions spanning multiple IM providers such as Facebook, Second Life, Google Talk and corporate IM. A further object of an embodiment of the invention is to provide a method and system that provides authentication and secure communications for multimodal dialogue systems. Another object of an embodiment of the invention is to provide a method and system that provides authentication and secure communications between two or more parties over multiple communications channels, which may vary in multiplicity between each and every communicating party.