1. Field of the Invention
This invention relates to the use of a speech compression technique to reduce the amount of information required to enable reasonably accurate reproduction of the original speech such that it can be transmitted at low bandwidth for pager or low power mobile telephone systems.
2. Brief Description of the Prior Art
Pager systems use narrow bandwidth to transmit a phone number to a portable unit. When the portable unit receives the number, the user must go to a telephone to either receive a recorded message or to call the initiating number. Alternatively, the initiating caller can send a keyed in message. It would be desirable for the caller to provide a spoken message that would be transmitted to the pager, but that would require a wider bandwidth than is generally used by pager systems. It would further be desirable to be able to use a pager system to send a spoken reply message, but that again would require more bandwidth and would also require considerably more power in the portable unit.
There is a desire to have mobile phones operate for longer time periods on smaller batteries. One solution is to have a high density of relay stations so that low power transmission can be used. However, this will not always be possible and a high power transmission may sometimes be required to reach the nearest relay station. Thus, it would be desirable to have a system that reduces required transmission power by reducing the amount of data that is needed for conveying a message.
Speech recognition systems have now reached the degree of sophistication whereby they are capable of recognizing and identifying many spoken words and phrases or phonemes and providing digital codes for representation, storage, transmission and reproduction of such words or phrases and/or phonemes. Such digital code greatly compresses the original audio information, but retains sufficient information to allow recreation of the original speech in an understandable form. Word or phrase recognition provides the greatest compression while phoneme recognition is more flexible.
Transmission of analog information, particularly sound, generally requires a relatively large amount of power and relatively large bandwidth. It is readily apparent that, with the bandwidth restrictions being placed upon transmission channels, the amount of information required to be transmitted within the allowed limited channel bandwidth in order to provide the same or similar sound recognition information at the receiving end will require additional transmission time. While digital techniques require less bandwidth than do analog techniques, the amount of digital information required for accurate duplication of the analog signal is considerably more than required for the identification of words or phonemes. For example, a few bits can represent a word or phoneme, while several bits may be used to represent a fraction of a second of analog information, where the number of bits and the duration of time represented depends upon the accuracy of reproduction to be achieved. Since the power utilized in transmission of information or data is far greater than the power generally required to manipulate the information or data to be transmitted, it is apparent that any procedures that will minimize transmission power requirements will be extremely useful, especially when battery operated devices with limited power availability are involved. Bandwidth restrictions are also important. This minimization of transmission power requirements becomes extremely important in devices such as pagers and mobile telephones which are battery operated and have limited available power. It is apparent that the amount of information that can be received and/or transmitted by the pager or mobile telephone will have a direct relation to the amount of available power and bandwidth and this amount of received information can be increased if it can be transmitted more efficiently and the ratio of message information transmitted to power consumption and bandwidth required can be increased.
While the discussion herein is primarily directed to spoken input, it can also be applied to typed input. An abbreviated code can be assigned to words and phrases and this abbreviated code can be substituted for the input string of characters for storage and transmission.
In accordance with the present invention, information, primarily but not limited to speech, is compressed to reduce the amount of transmitted information required to enable intelligible reproduction of the original information at a remote location.
In the preferred embodiment, the information is compressed at a pager base station prior to transmission to the remote pager. The remote pager stores the transmitted compressed information and recreates the information from the stored code on command.
Briefly, the above compression is accomplished by recognizing certain information, such as phonemes and/or words or phrases and/or any other types of information capable of being identified by a relatively small amount of digital data. The recognizable information capable of being identified with a small amount of data is coded at the transmitting station by comparison with a data base and providing the digital coding for transmission. In addition, the type of voice is recognized, such as male, female, child, foreign accent or a particular individual and coding is sent indicating such voice type along with the other recognizable information, such as volume or pitch. The amount of information detected and transmitted relative to voice type will determine the accuracy with which the voice is reproduced at the remote station. However, though an increased amount of information will provide more accurate voice reproduction, it will also require that an increased amount of information be transmitted, thereby utilizing more data transmission time. Also, additional processing of the information will be required both for compression and recreation. It follows that a tradeoff must be made. However, by transmitting voice recognition information as well as information relative to phonemes, words, phrases and the like, a more accurate reproduction of the original voice can be provided with a greatly reduced amount of transmitted information. For example, if the voice characteristics of certain individuals are also coded in the data base at each terminal, a single code relative to the individual talking can be provided, thereby providing an accurate voice reproduction at the receiving end while only one voice identifier and phoneme data are being transmitted.
Optionally, users of the system can have voice characteristics stored in the system data base and input an assigned code (for example, saying their name) to select their voice characteristics for the speech recreation.
The coding is transmitted and the transmitted code is identified with a corresponding code at the receiving end (e.g., at the central station and the remote pager). When the particular information capable of being compressed in the manner described herein is recognized at the transmission end by comparison in a data base and the type of voice or individual speaking is also recognized by comparison in a data base, the particular binary codes in the data base for that information are transmitted to the remote unit (e.g., the pager or mobile telephone).
The coded information to describe the type of information sent is preferably binary, thereby requiring very limited bandwidth for transmission. Also, the binary number representative of the information capable of being compressed can be transmitted to the remote location using much less power since the amount of data to be transmitted is substantially reduced. This coded information received at the remote location, such as a pager or mobile telephone, is used to reconstruct the information capable of being compressed as well as the voice type from the data base therein which has the same coded information as in the data base at the transmitting end. As stated above, in the case of a spoken message, the transmitted data can provide additional information such as, for example, whether the voice was male, female, a youth, accented, a particular individual, etc. in order to reproduce the original sound more closely. The remote station can also have some or all of the operational capabilities of the transmitting station, thereby permitting at least some type of communication in both directions. For mobile telephone applications, the preferred form is to have the mobile unit receive information in the standard format and to transmit information in the encoded format to save transmission power. The base station would optionally recreate audio information (either analog or digital) from the encoded information for further transmission in the telephone network. Optionally, the mobile unit could switch from standard transmission to encoded transmission, depending upon battery status and distance from the receiving station.
The representation of the phoneme or other information can be altered to the point that it is still recognizable, be close to the sound of the originally spoken information and yet can be transmitted using much less information than that required to transmit the phoneme as originally expressed.