1. Field of the Invention
The present invention relates to voice processing systems and more particularly the manner in which such systems interface to data packet networks.
2. Description of the Related Art
Voice Processing systems are well-known in the art (see http://www.computertelephony.org for a comprehensive directory). They provide a computer-based method of non-simultaneous telephone communications and carry out a variety of tasks. These include VoiceMail, whereby callers who cannot reach their intended target can instead record a message for subsequent retrieval, and interactive voice response (IVR) systems, which enable callers to interact with the voice processing system, typically via dual tone multiple frequency (DTMF) keys, to retrieve and update certain information. Conventional IVR applications include amongst others, telephone banking, flight booking enquiries and the placing of home shopping orders.
An example of a voice processing system is the DirectTalk for AIX voice processing system available from IBM Corporation, and described in the manual “DirectTalk for AIX, General Information and Planning”, reference number GC33-1840-00, plus the other manuals referenced therein. Like many modern voice processing systems, the DirectTalk system is based on a general-purpose computer (in this case an RS/6000 workstation) with additional hardware and software for the telephony and voice processing functions. (DirectTalk, AIX, and RS/6000 are trademarks of IBM Corporation).
Traditional Voice Processing systems have been designed to connect to the Public Switched Telephone Network (PSTN) either via an analogue telephone connection, or via digital trunks lines.
One of the major advantages associated with such systems, is that they are capable of handling large volumes of telephone traffic on a continuous 24 hour by 7 day basis. Individual signals are time multiplexed together for transmission over the digital trunks. In North America, the standard form of trunk line is known as T1, and provides 24 simultaneous lines with a data connection speed of 1.544 Mbps. These trunk lines can be used not only to carry the actual audio telephone signal, but also to provide a limited degree of signaling information, for example, to reserve a channel, to make a call on a channel, to transfer a call, and so on. In Europe, the standard form of trunk line is known as E1, and provides 32 simultaneous lines (of which 30 are for telephony channels, one for framing, and one for signaling). E1 lines support a data speed of 2.048 Mbps.
The telephone channels can be processed for IVR functions like DTMF detection, call progress detection, DTMF generation, voice recognition, text to speech and the playing of compressed and uncompressed voice segments or messages.
Typically, a voice processing system incorporates an internal time division multiplex (TDM) bus to coordinate its telephone channels. Each is allocated a timeslot on the bus. The main TDM bus types commercially used in voice processing systems are the PCM Expansion bus (PEB), supporting up to 128 64 kbps TDM timeslots, and the SCbus, supporting up to 2048 64 kbps timeslots, both from Dialogic Corporation. Other well-known TDM buses include, the Enterprise Computer Telephony Forum (ECTF) H.100 CTBus, and also the MVIP bus, defined by the GO-MVIP organization, and available from Natural Microsystems Corporation, which supports up to 512 64 kbps timeslots. These can be used either as part of a chassis backplane or via a ribbon cable between TDM interface cards.
The TDM bus interfaces support 64 k bit per second digital timeslots and both E1 and T1 trunks contain dedicated 64 k bit digital voice/data channels, so interfacing dedicated voice channels which are also 64 k bits per second is mainly a matter of synchronization of the channels from the appropriate telephony interface into the appropriate timeslots on the TDM bus. One timeslot can be used for the transmit direction of one 64 k channel and another for the receive direction.
It is also possible to interface voice processing systems to other lines (rather than just digital trunks), for example analogue voice channels, which can be sampled by Analogue to Digital (A to D) convertors into a 64 k bit voice/data channel or in the opposite direction using D to A convertors. Other transmission mediums which deliver dedicated 64 k bit per second channels using their transport mechanisms like Asynchronous Transfer Mode (ATM) could also be interfaced to voice processing systems. All these dedicated mediums are known as Isochronous since they provide a guaranteed data rate on the channel.
There has been much interest in the transmission of voice channels on data transmission networks which use protocols like TCPIP packet transmission instead of dedicated isochronous channels. There are advantages of voice over data (packet) networks in that interface costs to the network are lower and many computer users already have data connections. It also means that local work groups which use Local Area Networks (LANs) for data communication can re-use these for voice communication and possibly combined voice/data multimedia communication in realtime, but this is very much dependent on how busy the network is.
Some voice processing systems can already use one single interface for both voice and data, but this tends to be via a dedicated isochronous telephony channel like an analogue telephone channel or basic rate Integrated Services Digital Network (ISDN).
The disadvantage of voice over data packet networks is that there is no guarantee of exactly when any specific data packet may reach its destination. If there are multiple routes between the source and destination the packets may also be received out of order. For data this is easily resolved by queuing the packets up in the correct order and at some later time storing it or retransmitting it to another party. Realtime consistent data flow in which under run or overrun conditions matter is not a concern for data, but most certainly is for voice. Voice data received and played to someone out of order, would make speech unintelligible. Delays between playing out the first and second parts of a sentence are also unacceptable since this would be inconvenient to the user, and may change the context of the sentence for the listener.
A lot of work has gone on to resolve these issues and standards have been defined like the International Telephony Union (ITU) T Recommendations, H.323 on Visual Telephone systems and Equipment for LANs that do not guarantee quality of service. Within this recommendation there are references to other specifications and recommendations like data packet synchronization H.225, H.245 signaling control, H.261 and H.263 video codecs and G.711, G.722, G.728, G.729 and G.723 audio codecs.
Using these recommendations products are being developed to interface voice devices to data-packet networks (eg the Internet). One such device is an IP phone, which provides users with real-time, full-duplex communication over the Internet. The device is typically a home personal computer (PC) running application software which simulates the telephony environment. For example, the FreeTel Corporation market such software and it is possible to download a trial version from their web-site, http://www.freetel.inter.net. A graphical numeric keypad enables the user to dial another IP phone, connecting to the data-packet network via a dial-up Point-to-Point Protocol (PPP) or Serial Line Internet Protocol (SLIP) Internet connection. Voice data is initially input through a microphone, attached to the PC and is transmitted, via the machine's Internet connection, across the network to the other phone. The voice data can be heard through a headset which is also attached to the PC.
It is also important that telephone calls can be made from IP phones to PSTN phones. Gateway devices are being developed, which take IP data and stream it into 64 k bit per second isochronous format. Such devices also allow the data to be interfaced to E1 and T1 lines, and to Analogue channels, via D to A convertors. The streaming functions do two things:    i) Take voice from a 64 k bit per second channel via an Audio Codec,packetize and send to the IP phone;    ii) Take packetized voice, enqueue it until it has a certain length, thus guaranteeing a certain period of constant voice, and then stream this to an audio codec to play into a 64 k bit per second channel.
This allows a telephone call to be made from an IP phone to an ISDN or analogue telephone connected to the PSTN. The gateway can also allow incoming calls from a normal telephone to an IP phone. The routing of such calls is done by a process called a Gatekeeper as specified in the H.323 standard. Additionally, a gateway should also be able to synchronize to TDM buses of the types previously discussed.
The advantages of transmitting voice channels on data-packet networks such as the Internet have already been discussed. For these same reasons, it is extremely beneficial for voice processing systems to interface directly to data-packet networks rather than have to accept IP phone calls which have first had to be routed via the PSTN. Although such a setup is known, existing IP solutions appear to require a new IP interface card which has all the voice processing resource on board to handle all incoming calls. Some cards may also have both the IP interface and the PSTN interface encapsulated within them. The whole process is therefore contained within the one card.
Therefore the approach thus far has been to design a new adapter with a data connection interface which handles all of the H.323 recommendations, including voice transport, signaling, does all the voice processing on board and interfaces to the IVR applications. Specific voice processing algorithms like voice compression/decompression, silence detection, DTMF generation/detection are also typically included. Two examples of this are:    i) The Dialogic Corporation (See http://www.dialogic.com) has an IP only card, DM3 IPLink Release 2, with Ethernet and Quad E1/T1 interface on board. However, it is really designed as an IP gateway solution to act as a bridge between the PSTN and an IP network. The software APIs to drive the card differ from those used to process normal PSTN only calls.    ii) The Linkon Corporation (See http://www.linkon.com) supplies an IP card solution, LinkNet. This acts as an IP switch and also provides an enhanced services platform. Those services supported include amongst others, fax, interactive voice response, and speech recognition. LinkNet also facilitates the transmission of both voice and fax over IP, using an IVR front-end, and also supports a wide variety of signaling and call control functions. All this is achieved via the one single platform. This solutions permits dynamic switching between the available call routes. In other words: Internet; or Intranet; or PSTN.
However, such solutions tend to incorporate new software Application Programming Interfaces (APIs) to drive the software solution. Thus previously purchased existing hardware cards become redundant should the customer wish to move from a voice processing system connected directly to the PSTN to an IP connection instead. This means applications written for a voice processing system which is PSTN connected won't work when IP connected unless they have some modifications applied.