The overwhelming majority of access to computer resources today from remote locations has been via remote electronic data communications. There are many forms of such access including for example modems or digital subscriber lines. Remote users communicate with, and access the resources of, a local system via a personal computer or computer appliance, such as for example a palm-sized scaled-down version of a personal computer.
Applications typically support connected computers having graphical user interfaces. However, similar interface functionality is not supported for end-devices having voice user interfaces. As a result, a user's access to the functionality of a particular application or resource is dictated by the manner in which the user accesses the computer system upon which the application or resource resides.
Businesses typically have two systems accessed remotely on a regular basis by their users. A local area network handles data communications, and a private branch exchange (PBX) system handles voice communications. The local area network provides access by users to file and computer applications/servers thereby enabling a user to carry out computer applications on a computer from a remote location. The PBX system enables users to retrieve and respond to voice messages left for the users on the PBX voice mail system. The PBX also enables a remote user to call multiple persons served by the PBX with a single call.
The businesses also include two separate and distinct sets of physical communications lines to their places of business. A first set of lines provide communication links between a public switched telephone network (PSTN) and a private branch exchange (PBX) system including phones and other telephony. A set of PSTN lines terminate at a business site at a PBX connected to a business' internal phone lines. A second set of lines provide links between external data networks and internal local area networks (LANs) for the businesses. Examples of such lines are T1, E1, ISDN, PRI, and BRI.
In recognition of the potential efficiencies arising from converging two physically and operationally distinct networks into a single network, the network technology industry has sought to define and implement a single, converged, network meeting the demands for all types of communications including voice, facsimile, data, etc. As a result, a new telephony/data transmission paradigm is emerging. The new paradigm is based upon a packet-based, switched, multi-media network. Data and voice, while treated differently at the endpoints by distinct applications, share a common transport mechanism.
Convergence presents the opportunity for the creation of applications including communication interfaces that not only support computer-generated commands, but also voice commands from a remote user. It also presents the opportunity to enhance the variety and flexibility of uses for PBX systems.
One aspect of computer systems accessed remotely via voice commands is the implementation of security measures. Voice interfaces present the opportunity for users to connect to a network from virtually any location. Presently, security mechanisms for restricted access systems accessed via telephone typically rely upon users to enter a number on a touch-tone phone to limit access. However, this method is highly susceptible to eavesdropping. Also, the users are often required to enter a long sequence of numbers that can easily be forgotten. A voice-controlled computer system will require speech recognition functionality. Speech recognition programs and associated “training” databases (used to train the software to recognize voice commands from a user) do not guarantee that another user's speech will not invoke protected operations on the computer system. Thus, if the computer system is to be secure, then additional speaker recognition/authentication procedures must be included in the system.
The use of speaker recognition/authentication processes to protect resources in a computer system is known. Such systems have weaknesses that enable imposters to gain access to the computer system. The simplest voice authentication scheme requires a user to speak a password, and the authentication system verifies the user by comparing the spoken password to an existing copy of the password. An obvious weakness to this authentication procedure is that the security system cannot distinguish between whether the user is the source of the vocalized password or it is merely an electronically recorded copy of the user's voice.
One solution to the well known “electronically-recorded” password scheme is to request the user to utter the password multiple times. The multiple utterances, in addition to being compared to the digitally stored vocal password at the computer system site, are compared to one another to ensure that the utterances are sufficiently different from one another to ensure that a recording of the password is not being replayed multiple times by an imposter seeking to gain remote access to protected computer resources. Of course, the imposter can circumvent this safeguard by making multiple recordings of the password spoken multiple times by an authorized user. Furthermore, copies of a single original spoken password can be altered and then stored to create variations from the original.
What is needed is a speaker authentication scheme wherein imposters cannot use a recording of the user's voice to render a valid passwords to gain access to protected computer resources. There exist a number of systems that attempt to overcome the shortcomings of voice-based authentication schemes. Such authentication mechanisms include smart cards, secure ID's, and retina scanners. However, these mechanisms require special hardware at the site from which a user calls.
In accordance with another aspect of a converged wide-area network interface to a computer system, there is an interest to exploit a system wherein telephony and digital data systems share programs and data. Voice-based computer access, described above, is one such effort to exploit converged technology. Once authenticated, a user may access computer resources via voice commands rather than issuing commands by means of a remote computer (e.g., a laptop computer). The user may access a number of applications integrated into the converged local network including databases, file servers, Interactive Voice Response (IVR) servers, call centers, voice mail, PBX hubs/endnodes, and conference bridges.
With regard to the last of the listed potential applications, it is noted that conference bridges are generally implemented today in two ways. One way is to purchase a Conference Bridge with certain capacity. It is then used as a fixed resource like a physical conference room. If a conference bridge has 24 ports it can support one 24-user conference call. It could also support three eight-port conference calls.
Extending the size of a conference via external conference bridging is a challenge to coordinators of a conference. A second conference phone number has to be forwarded to each of the participants who is to be bridged into the conference via the external bridge. Then the external conference bridge calls in to the internal conference bridge. Alternatively, callers could call a number that is received by the PBX handling the conference which in turn forwards the call to an external conference bridge. However, each forwarded call uses two trunks in the PBX system.
Another option is to subscribe to a conference bureau. A bureau is a service that supplies an external conference bridge (and number to call into the bridge). The bureau typically charges a customer based upon the number of users and the duration of the use of the bridge (e.g., per user-minute). External bridges allow for more dynamic meetings however the cost for utilizing external bridges on a regular basis is substantial.