The present invention relates to techniques for communicating digital information, and more particularly to techniques for communicating digital information over a coded voice channel.
There is an increasing demand for advanced telephony services from customers, such as automated services that may be accessed and commanded by control sequences that are transmitted from a remote location. As a consequence, techniques have been developed for providing access to services from a communications network. In the world of wireless communication, ongoing work includes the development of a Wireless Application Protocol (WAP), which is a layered communication protocol that includes network layers (e.g., transport and session layers) as well as an application environment including a microbrowser, scripting, telephony value-added services and content formats. One part of WAP is the Telephony Value Added Services (TeleVAS), which is a secure way to access local functions like Call Control, Phonebook, Messaging and the like by means of a device independent interface to the underlying vendor specific operating system and telephony subsystem.
In fixed networks, techniques for providing access to services from a communications network have included the use of Intelligent Networks in which Service Access Points are nodes in the network that customers can access to obtain advanced services. It has also become common to access services at nodes that are independent of any traditional network operator. These nodes are implemented as service computers that can be connected in independent computer networks (e.g., the Internet) and accessed from at least one communications network (e.g., a telephony network or a mobile network such as the European standard Global System for Mobile Communication (GSM)). The communications network (e.g., a public telephony network or a mobile network) is then only utilized for establishing access to these independent computer networks. In order to keep the services provided by the network of service nodes independent of the traditional telecommunication networks, the access to a service node through such a telecommunications network can carry both data (e.g., speech) and control signaling on the same channel (i.e., in-band signaling can be applied).
In a cellular communications system, it is common for operators to offer a Short Message Service (SMS) for sending short messages to the cellular terminal. The messages are routed over a Short Message Service Center (SMS-C) server that stores and forwards the messages. The SMS service has several disadvantages with respect to the problem of exchanging control signals between a user terminal and a service node. For example, the SMS service does not render the sender any control of delays, and it provides no information about the status of the message. Furthermore, the pricing of the SMS service differs substantially from one operator to the next, with some operators keeping the price at a level that makes the service too expensive for many users. Another disadvantage is that various cellular network operators offer interfaces other than the SMS-C interface, from servers outside the cellular network, which means that it is cumbersome to send SMS messages to terminals belonging to different networks.
It is further known how to establish separate voice and data paths between two terminals through a plurality of telecommunication networks, one of which is a mobile network. However, the switching between the two modes is awkward and time consuming, which causes inconveniences to the user.
Whereas systems such as Internet Protocol (IP) communication can easily cope with mixed speech and data, this presents problems if the communication path includes a mobile network, such as a GSM network. More particularly, in this latter case the communication path includes a voice coder that is optimized for human speech and thus in-band modem signaling by means of, for example, tone frequencies (e.g., Dual Tone Multi-Frequency, or xe2x80x9cDTMFxe2x80x9d) will result in a slow data rate at the risk of an increased error rate. A reason for this is that the character of a modem signal makes it less predictable than a voice signal. Known methods for managing these difficulties suffer from being impracticable from a user point of view or otherwise lead to technical solutions that are specific for each type of network involved. Further, future voice coders may behave even more unfavorably with respect to the ability to pass DTMF signals. Therefore, in-band signaling In communication paths comprising a plurality of networks, at least one including voice coding, is a problem to which an advantageous solution is needed.
The PCT Publication No. WO96/09708 by Hamalainen et al. (xe2x80x9cSimultaneous Transmission of Speech and Data on a Mobile Communications System) describes how to use a voice channel over an air interface in a mobile system to transmit simultaneous voice and data, and in particular discloses a method and system whereby silent periods can be detected when no voice is present, thereby allowing the insertion of data into the transmitted frames. This publication further describes how the frames are completed with information bits in order to permit the separation of voice and data frames at the network side. A characteristic of the described solution is that it depends on the air interface protocol and that the means for separation of voice and data are integrated with the network. This solution is therefore not useful for solving the problem of simultaneous voice and data between a first mobile user terminal and a second service node that is external to and independent of the telecommunication networks involved in the speech path between the nodes.
It is further becoming common to adopt speech recognition methods for speech control of user services. A disadvantage with known methods is the need to xe2x80x9ctrainxe2x80x9d the speech recognition system to understand a specific vocabulary, language characteristics and even characteristics of the voice of the speaking person.
It is therefore an object of the present invention to provide techniques for adapting non-speech data for transmission via a coded voice channel in an air interface in a mobile telecommunications system (e.g., a GSM-system), so that the air interface will accommodate the in-band signaling that has been described above with respect to the land-based communications systems.
It is a farther object of the present invention to provide a common xe2x80x9clanguagexe2x80x9d for interfacing with user service nodes that utilize speech recognition techniques as a control interface.
In accordance with one aspect of the present invention, the foregoing and other objects are achieved in techniques and apparatus for transmitting a digital input symbol to a receiver. This is accomplished by determining one or more formant frequencies that correspond to the digital input symbol, and generating a signal having the one or more formant frequencies. The signal may then be supplied for transmission over a voice channel. The signal is particularly suited for this purpose because it comprises formant frequencies, which the voice channel is particularly adapted for. For example, the signal may be supplied to a voice coder that generates an encoded signal for transmission over a voice channel.
In another aspect of the invention, a preprogrammed addressable memory is utilized to perform the mapping between the set of input symbols and the set of corresponding formant frequencies. In particular, the step of determining one or more formant frequencies that correspond to the digital input symbol comprises the steps of supplying the digital input symbol to an address input port of an addressable memory means, wherein the addressable memory means has formant frequency codes stored therein at addresses such that when the digital input symbol is supplied to the address input port of the addressable memory means, a corresponding formant frequency code appears at an output port of the addressable memory means. The corresponding formant frequency code appearing at the output port of the addressable memory is then used as an indicator of the determined one or more formant frequencies.
In still another aspect of the invention, the corresponding formant frequency code indicates a sequence of formant frequencies. Then, the step of generating the signal having the one or more formant frequencies comprises the step of generating the sequence of formant frequencies indicated by the corresponding formant frequency code.
In yet another aspect of the invention, a Forward Error Correction (FEC) code is also transmitted with the formant frequencies over the voice channel. In particular, a forward error correction code is determined for the digital input symbol, and the one or more formant frequencies are modified as a function of the forward error correction code. Then, a signal having the one or more modified formant frequencies are generated for transmission over the voice channel. The modification may, for example, affect a volume attribute or a pitch attribute of the one or more formant frequencies.
In yet another aspect of the invention, both speech and digital input symbols may be transmitted to a receiver. This includes transmitting the speech to the receiver by means of a voice channel. When it is desired to transmit data, a change to a data transmission mode is made by automatically generating a predetermined sequence of formant frequencies and transmitting the automatically generated formant frequencies to the receiver by means of the voice channel. This signals the change in mode to the receiver. Then, the digital input symbols are mapped onto a corresponding formant sequence. A signal representing the corresponding formant sequence is then transmitted to the receiver by means of the voice channel.
In still another aspect of the invention, a return to a speech transmission mode may be made by automatically generating a second predetermined sequence of formant frequencies and transmitting the automatically generated second sequence of formant frequencies to the receiver by means of the voice channel. The second predetermined sequence of formant frequencies is the mechanism for signaling to the receiver the change in mode.
In yet another aspect of the invention, control signals for controlling a speech-controlled automated server may be generated by converting a spoken command into a first command signal, and supplying the first command signal to speech recognition means. The speech recognition means is used to determine one or more formant frequencies that correspond to the first command signal, wherein the one or more formant frequencies constitute a command that is recognizable by the automated server. A second command signal is then generated having the one or more formant frequencies. This feature permits almost any user to interface with an automatic server because the user""s spoken commands are, in effected, xe2x80x9ctranslatedxe2x80x9d into another set of formant frequencies that the automated server has been trained on.