When two or more persons having a phone conference, want to share electronic data or media content with possibly confidential or private information, such as a document, a picture, or a presentation, they need to setup an additional communication channel. This is typically done over a computer data network.
The setup of such a secure communication channel over an insecure computer data network is cumbersome and vulnerable to attacks. There are two problems to be solved: (a) how do the endpoints find each other on the network? This is connectivity issue. And (b) how can it be really known that the connection is genuine and not an impersonator? This is an authentication problem.
The connectivity issue can be solved by using conventional peer-to-peer or client-server connection setup techniques. The authentication issue is more difficult to solve.
Encryption protocols include an authenticated key exchange phase and it is this first step that is the most vulnerable and also the most annoying.
Authenticated key exchange depends either on a shared secret or on a trusted third party such as a public certification authority. But using a trusted third party still requires an authentication phase: one has to prove his identity via login and passwords, so this doesn't help.
Relying on a pre-shared secret is also unsuitable, because it requires a special exchange step prior to the phone conversation. This is unsuitable for ad-hoc communications between people who may never have met.
Any system requiring passwords or pre-shared secrets is vulnerable to attacks. Secrets and passwords can be stolen or lost and certificates can be forged.
A common secret could be generated ad hoc and can be communicated over the phone, but this process is again not user friendly. It's annoying and intrusive. The secret could be exchanged via another out-of-band channel such as email or fax. However, this method has the same problems as above and also requires the participants to make additional steps, such as opening email clients, communicating email addresses, etc. In addition, these methods are susceptible to eavesdropping. Telephone connections or email messages are insecure ways of communication that are easily compromised.
Several methods have been devised to generate an ad hoc secret and communicate this over the phone to the participants. Some methods use watermarking techniques to hide the secret in an audible signal. However, phone connections are not well suited to robustly transmit metadata, without being very intrusive and hinder the normal conversation. The secret also needs to be repeated to late-corners. And a login procedure prior or during the call is still required to protect against eavesdropping, with again the same problems regarding ease of use and security.
The fundamental problem of all automatic authentication methods is that they use keys or passwords that are in no way related to the way that human beings use to identify and authenticate each other. So, setting up an additional electronic communication channel, naturally requires another secret password, at some stage or another, for authentication. The problem thus is finding a way to extend this robust manual authentication method to the second, electronic communication channel, in an ad-hoc, transparent and robust way.
WO 2013/138651 describes a method to allow authorization of computing device association using human-perceptible signals. The method includes forming an association between a first computing device and a second computing device, computing with the first computing device a fingerprint of a common key derived during forming of the association, and emitting with the first computing device a first audio stimulus based upon the computed fingerprint. The first audio stimuli is at least one octave number apart from a second audio stimuli of the second computing device emitted by the second computing device based upon the common key. Accordingly the first computing device is configured to emit human perceptible sounds, i.e. to emit a first audio stimulus based upon the computed fingerprint. A second audio stimulus is emitted by the second computing device based on the common key. If a simultaneous playback of both sequences is harmonious, the common key was exchanged, if it is discordant the devices failed to exchange a common key.
US 2003/0135740 describes a network based mechanism for real time verification and authentication of data and user identities. Biometric elements, such as voice prints, are utilized to enhance the Public Key Infrastructure as a means to decrypt data and verify data authenticity, such that the user's private key is authenticated remotely on a one-time basis. An authentication server has various software modules that enable authentication of user identity, secure user access to data, digital signatures, secure messaging and secure online transactions.